Next Page Previous Page Home Tools & Aids Search Handbook

1. Exploratory Data Analysis
1.3. EDA Techniques
1.3.5. Quantitative Techniques

1.3.5.13.

Runs Test for Detecting Non-randomness

Purpose:
Detect Non-Randomness
The runs test ( Bradley, 1968) can be used to decide if a data set is from a random process.

A run is defined as a series of increasing values or a series of decreasing values. The number of increasing, or decreasing, values is the length of the run. In a random data set, the probability that the (I+1)th value is larger or smaller than the Ith value follows a binomial distribution, which forms the basis of the runs test.

Typical Analysis and Test Statistics The first step in the runs test is to compute the sequential differences (Yi - Yi-1). Positive values indicate an increasing value and negative values indicate a decreasing value. A runs test should include information such as the output shown below from Dataplot for the LEW.DAT data set. The output shows a table of:
  1. runs of length exactly I for I = 1, 2, ..., 10
  2. number of runs of length I
  3. expected number of runs of length I
  4. standard deviation of the number of runs of length I
  5. a z-score where the z-score is defined to be

      Z(i) = (X(i) - Xbar)/s

    where Ybar is the sample mean and s is the sample standard deviation.

The z-score column is compared to a standard normal table. That is, at the 5% significance level, a z-score with an absolute value greater than 1.96 indicates non-randomness.

There are several alternative formulations of the runs test in the literature. For example, a series of coin tosses would record a series of heads and tails. A run of length r is r consecutive heads or r consecutive tails. To use the Dataplot RUNS command, you could code a sequence of the N = 10 coin tosses HHHHTTHTHH as

    1 2 3 4 3 2 3 2 3 4
that is, a heads is coded as an increasing value and a tails is coded as a decreasing value.

Another alternative is to code values above the median as positive and values below the median as negative. There are other formulations as well. All of them can be converted to the Dataplot formulation. Just remember that it ultimately reduces to 2 choices. To use the Dataplot runs test, simply code one choice as an increasing value and the other as a decreasing value as in the heads/tails example above. If you are using other statistical software, you need to check the conventions used by that program.

Sample Output
Dataplot generated the following runs test output using the LEW.DAT data set:
 
  
                    RUNS UP
  
         STATISTIC = NUMBER OF RUNS UP
             OF LENGTH EXACTLY I
  
 I         STAT     EXP(STAT)    SD(STAT)       Z
  
 1        18.0     41.7083      6.4900       -3.65
 2        40.0     18.2167      3.3444        6.51
 3         2.0      5.2125      2.0355       -1.58
 4         0.0      1.1302      1.0286       -1.10
 5         0.0      0.1986      0.4424       -0.45
 6         0.0      0.0294      0.1714       -0.17
 7         0.0      0.0038      0.0615       -0.06
 8         0.0      0.0004      0.0207       -0.02
 9         0.0      0.0000      0.0066       -0.01
10         0.0      0.0000      0.0020        0.00
  
  
         STATISTIC = NUMBER OF RUNS UP
             OF LENGTH I OR MORE
  
 I         STAT     EXP(STAT)    SD(STAT)       Z
  
 1        60.0     66.5000      4.1972       -1.55
 2        42.0     24.7917      2.8083        6.13
 3         2.0      6.5750      2.1639       -2.11
 4         0.0      1.3625      1.1186       -1.22
 5         0.0      0.2323      0.4777       -0.49
 6         0.0      0.0337      0.1833       -0.18
 7         0.0      0.0043      0.0652       -0.07
 8         0.0      0.0005      0.0218       -0.02
 9         0.0      0.0000      0.0069       -0.01
10         0.0      0.0000      0.0021        0.00
  
  
                   RUNS DOWN
  
         STATISTIC = NUMBER OF RUNS DOWN
             OF LENGTH EXACTLY I
  
 I         STAT     EXP(STAT)    SD(STAT)       Z
  
 1        25.0     41.7083      6.4900       -2.57
 2        35.0     18.2167      3.3444        5.02
 3         0.0      5.2125      2.0355       -2.56
 4         0.0      1.1302      1.0286       -1.10
 5         0.0      0.1986      0.4424       -0.45
 6         0.0      0.0294      0.1714       -0.17
 7         0.0      0.0038      0.0615       -0.06
 8         0.0      0.0004      0.0207       -0.02
 9         0.0      0.0000      0.0066       -0.01
10         0.0      0.0000      0.0020        0.00
  
  
         STATISTIC = NUMBER OF RUNS DOWN
             OF LENGTH I OR MORE
  
  
 I         STAT     EXP(STAT)    SD(STAT)       Z
  
 1        60.0     66.5000      4.1972       -1.55
 2        35.0     24.7917      2.8083        3.63
 3         0.0      6.5750      2.1639       -3.04
 4         0.0      1.3625      1.1186       -1.22
 5         0.0      0.2323      0.4777       -0.49
 6         0.0      0.0337      0.1833       -0.18
 7         0.0      0.0043      0.0652       -0.07
 8         0.0      0.0005      0.0218       -0.02
 9         0.0      0.0000      0.0069       -0.01
10         0.0      0.0000      0.0021        0.00
  
  
         RUNS TOTAL = RUNS UP + RUNS DOWN
  
       STATISTIC = NUMBER OF RUNS TOTAL
            OF LENGTH EXACTLY I
  
 I         STAT     EXP(STAT)    SD(STAT)       Z
  
 1        43.0     83.4167      9.1783       -4.40
 2        75.0     36.4333      4.7298        8.15
 3         2.0     10.4250      2.8786       -2.93
 4         0.0      2.2603      1.4547       -1.55
 5         0.0      0.3973      0.6257       -0.63
 6         0.0      0.0589      0.2424       -0.24
 7         0.0      0.0076      0.0869       -0.09
 8         0.0      0.0009      0.0293       -0.03
 9         0.0      0.0001      0.0093       -0.01
10         0.0      0.0000      0.0028        0.00
  
  
       STATISTIC = NUMBER OF RUNS TOTAL
             OF LENGTH I OR MORE
  
 I         STAT     EXP(STAT)    SD(STAT)       Z
  
 1       120.0    133.0000      5.9358       -2.19
 2        77.0     49.5833      3.9716        6.90
 3         2.0     13.1500      3.0602       -3.64
 4         0.0      2.7250      1.5820       -1.72
 5         0.0      0.4647      0.6756       -0.69
 6         0.0      0.0674      0.2592       -0.26
 7         0.0      0.0085      0.0923       -0.09
 8         0.0      0.0010      0.0309       -0.03
 9         0.0      0.0001      0.0098       -0.01
10         0.0      0.0000      0.0030        0.00
  
  
        LENGTH OF THE LONGEST RUN UP         =     3
        LENGTH OF THE LONGEST RUN DOWN       =     2
        LENGTH OF THE LONGEST RUN UP OR DOWN =     3
  
        NUMBER OF POSITIVE DIFFERENCES =   104
        NUMBER OF NEGATIVE DIFFERENCES =    95
        NUMBER OF ZERO     DIFFERENCES =     0
  
  
Interpretation of Sample Output Scanning the last column labeled "Z", we note that most of the z-scores for run lengths 1, 2, and 3 have an absolute value greater than 1.96. This is strong evidence that these data are in fact not random.

Output from other statistical software may look somewhat different from the above output.

Question The runs test can be used to answer the following question:
  • Were these sample data generated from a random process?
Importance Randomness is one of the key assumptions in determining if a univariate statistical process is in control. If the assumptions of constant location and scale, randomness, and fixed distribution are reasonable, then the univariate process can be modeled as:
    y(i) = A0 + E(i)
where Ei is an error term.

If the randomness assumption is not valid, then a different model needs to be used. This will typically be either a times series model or a non-linear model (with time as the independent variable).

Related Techniques Autocorrelation
Run Sequence Plot
Lag Plot
Case Study Heat flow meter data
Software Most general purpose statistical software programs, including Dataplot, support a runs test.
Home Tools & Aids Search Handbook Previous Page Next Page