8.2.3.4. Trend tests

8. Assessing Product Reliability
8.2. Assumptions/Prerequisites
8.2.3. How can you test reliability model assumptions?

8.2.3.4. Trend tests

Formal Trend Tests should accompany Trend Plots and Duane Plots. Three are given in this section

In this section we look at formal statistical tests that can allow us to quantitatively determine whether or not the repair times of a system show a significant trend (which may be an improvement or a degradation trend). The section on trend and growth plotting contained a discussion of visual tests for trends - this section complements those visual tests as several numerical tests are presented.

Three statistical test procedures will be described:

The Reverse Arrangement Test (a simple and useful test that has the advantage of making no assumptions about a model for the possible trend)
The Military Handbook Test (optimal for distinguishing between "no trend" and a trend following the NHPP Power Law or Duane model)
The Laplace Test (optimal for distinguishing between "no trend" and a trend following the NHPP Exponential Law model)

The Reverse Arrangement Test (RAT test) is simple and makes no assumptions about what model a trend might follow

The Reverse Arrangement Test

Assume there are $r$ repairs during the observation period and they occurred at system ages $T_1, \, T_2, \, T_3, \, \ldots, \, T_r$. (We set the start of the observation period to $T = 0$). Let $I_1 = T_1, \, I_2 = T_2 - T_1, \, I_3 = T_3 - T_2, \, \ldots, \, I_r = T_r - T_{r-1}$ be the inter-arrival times for repairs (i.e., the sequence of waiting times between failures). Assume the observation period ends at time $T_{end} > T_r$.

Previously, we plotted this sequence of inter-arrival times to look for evidence of trends. Now, we calculate how many instances we have of a later inter-arrival time being strictly greater than an earlier inter-arrival time. Each time that happens, we call it a reversal. If there are a lot of reversals (more than are likely from pure chance with no trend), we have significant evidence of an improvement trend. If there are too few reversals we have significant evidence of degradation.

A formal definition of the reversal count and some properties of this count are:

count a reversal every time $I_j < I_k$ for some $j$ and $k$ with $j < k$
this reversal count is the total number of reversals $R$
for $r$ repair times, the maximum possible number of reversals is $r(r-1)/2$
if there are no trends, on the average one would expect to have $r(r-1)/4$ reversals.

As a simple example, assume we have 5 repair times at system ages 22, 58, 71, 156 and 225, and the observation period ended at system age 300 . First calculate the inter arrival times and obtain: 22, 36, 13, 85, 69. Next, count reversals by "putting your finger" on the first inter-arrival time, 22, and counting how many later inter arrival times are greater than that. In this case, there are 3. Continue by "moving your finger" to the second time, 36, and counting how many later times are greater. There are exactly 2. Repeating this for the third and fourth inter-arrival times (with many repairs, your finger gets very tired!) we obtain 2 and 0 reversals, respectively. Adding 3 + 2 + 2 + 0 = 7, we see that $R$ = 7. The total possible number of reversals is 5x4/2 = 10 and an "average" number is half this, or 5.

In the example, we saw 7 reversals (2 more than average). Is this strong evidence for an improvement trend? The following table allows us to answer that at a 90 % or 95 % or 99 % confidence level - the higher the confidence, the stronger the evidence of improvement (or the less likely that pure chance alone produced the result).

A useful table to check whether a reliability test has demonstrated significant improvement

Value of $R$ Indicating Significant Improvement
(One-Sided Test)
Number of Repairs	Minimum $R$ for 90 % Evidence of Improvement	Minimum $R$ for 95 % Evidence of Improvement	Minimum $R$ for 99 % Evidence of Improvement
4	6	6	-
5	9	9	10
6	12	13	14
7	16	17	19
8	20	22	24
9	25	27	30
10	31	33	36
11	37	39	43
12	43	46	50

One-sided test means before looking at the data we expected improvement trends, or, at worst, a constant repair rate. This would be the case if we know of actions taken to improve reliability (such as occur during reliability improvement tests).

For the $r$ = 5 repair times example above where we had $R$ = 7, the table shows we do not (yet) have enough evidence to demonstrate a significant improvement trend. That does not mean that an improvement model is incorrect - it just means it is not yet "proved" statistically. With small numbers of repairs, it is not easy to obtain significant results.

For numbers of repairs beyond 12, there is a good approximation formula that can be used to determine whether $R$ is large enough to be significant. Calculate

Use this formula when there are more than 12 repairs in the data set

$$ z = \frac{R - \frac{r(r-1)}{4} + 0.5}{\sqrt{\frac{(2r+5)(r-1)r}{72}}} \, , $$ and if $z$ > 1.282, we have at least 90 % significance. If $z$ > 1.645, we have 95 % significance, and a $z$ > 2.33 indicates 99 % significance since $z$ has an approximate standard normal distribution.

That covers the (one-sided) test for significant improvement trends. If, on the other hand, we believe there may be a degradation trend (the system is wearing out or being over stressed, for example) and we want to know if the data confirms this, then we expect a low value for $R$ and we need a table to determine when the value is low enough to be significant. The table below gives these critical values for $R$.

Value of $R$ Indicating Significant Degradation Trend (One-Sided Test)
Number of Repairs	Maximum $R$ for 90 % Evidence of Degradation	Maximum $R$ for 95 % Evidence of Degradation	Maximum $R$ for 99 % Evidence of Degradation
4	0	0	-
5	1	1	0
6	3	2	1
7	5	4	2
8	8	6	4
9	11	9	6
10	14	12	9
11	18	16	12
12	23	20	16

For numbers of repairs $r$ > 12, use the approximation formula above, with $R$ replaced by $[r(r-1)/2 - R]$.

Because of the success of the Duane model with industrial improvement test data, this Trend Test is recommended

The Military Handbook Test

This test is better at finding significance when the choice is between no trend and a NHPP Power Law (Duane) model. In other words, if the data come from a system following the Power Law, this test will generally do better than any other test in terms of finding significance.

As before, we have $r$ times of repair $T_1, \, T_2, \, \ldots, \, T_r$ with the observation period ending at time $T_{end} > T_r$. Calculate $$ \chi_{2r}^2 = 2 \sum_{i=1}^r \mbox{ ln } \frac{T_{end}}{T_i} \, , $$ and compare this to percentiles of the Chi-Square distribution with $2r$ degrees of freedom. For a one-sided improvement test, reject no trend (or HPP) in favor of an improvement trend if the chi square value is beyond the 90 (or 95, or 99) percentile. For a one-sided degradation test, reject no trend if the chi-square value is less than the 10 (or 5, or 1) percentile.

Applying this test to the 5 repair times example, the test statistic has value 13.28 with 10 degrees of freedom, and the chi-square percentile is 79 %.

The Laplace Test

This test is better at finding significance when the choice is between no trend and a NHPP Exponential model. In other words, if the data come from a system following the Exponential Law, this test will generally do better than any test in terms of finding significance.

As before, we have $r$ times of repair $T_1, \, T_2, \, \ldots, \, T_r$ with the observation period ending at time $T_{end} > T_r$. Calculate $$ z = \frac{\sqrt{12r} \sum_{i=1}^r \left( T_i - \frac{T_{end}}{2}\right)}{r T_{end}} \, , $$ and compare this to high (for improvement) or low (for degradation) percentiles of the standard normal distribution.

Formal tests generally confirm the subjective information conveyed by trend plots

Case Study 1: Reliability Test Improvement Data (Continued from earlier work)

The failure data and Trend plots and Duane plot were shown earlier. The observed failure times were: 5, 40, 43, 175, 389, 712, 747, 795, 1299 and 1478 hours, with the test ending at 1500 hours.

Reverse Arrangement Test: The inter-arrival times are: 5, 35, 3, 132, 214, 323, 35, 48, 504 and 179. The number of reversals is 33, which, according to the table above, is just significant at the 95 % level.

The Military Handbook Test: The Chi-Square test statistic, using the formula given above, is 37.23 with 20 degrees of freedom and has significance level 98.9 %. Since the Duane Plot looked very reasonable, this test probably gives the most precise significance assessment of how unlikely it is that sheer chance produced such an apparent improvement trend (only about 1.1 % probability).