|
1.
Exploratory Data Analysis
1.3. EDA Techniques 1.3.5. Quantitative Techniques
|
|||
|
Purpose: Detect Non-Randomness |
The runs test (
Bradley, 1968)
can be used to decide if a data set is from a random process.
A run is defined as a series of increasing values or a series of decreasing values. The number of increasing, or decreasing, values is the length of the run. In a random data set, the probability that the (I+1)th value is larger or smaller than the Ith value follows a binomial distribution, which forms the basis of the runs test. |
||
| Typical Analysis and Test Statistics |
The first step in the runs test is to compute the sequential
differences (Yi - Yi-1).
Positive values indicate an increasing value and negative
values indicate a decreasing value. A runs test should include
information such as the output shown below from Dataplot
for the LEW.DAT data set.
The output shows a table of:
There are several alternative formulations of the runs test in the literature. For example, a series of coin tosses would record a series of heads and tails. A run of length r is r consecutive heads or r consecutive tails. To use the Dataplot RUNS command, you could code a sequence of the N = 10 coin tosses HHHHTTHTHH as
Another alternative is to code values above the median as positive and values below the median as negative. There are other formulations as well. All of them can be converted to the Dataplot formulation. Just remember that it ultimately reduces to 2 choices. To use the Dataplot runs test, simply code one choice as an increasing value and the other as a decreasing value as in the heads/tails example above. If you are using other statistical software, you need to check the conventions used by that program. |
||
|
Sample Output |
Dataplot generated the following runs test output using
the LEW.DAT data set:
RUNS UP
STATISTIC = NUMBER OF RUNS UP
OF LENGTH EXACTLY I
I STAT EXP(STAT) SD(STAT) Z
1 18.0 41.7083 6.4900 -3.65
2 40.0 18.2167 3.3444 6.51
3 2.0 5.2125 2.0355 -1.58
4 0.0 1.1302 1.0286 -1.10
5 0.0 0.1986 0.4424 -0.45
6 0.0 0.0294 0.1714 -0.17
7 0.0 0.0038 0.0615 -0.06
8 0.0 0.0004 0.0207 -0.02
9 0.0 0.0000 0.0066 -0.01
10 0.0 0.0000 0.0020 0.00
STATISTIC = NUMBER OF RUNS UP
OF LENGTH I OR MORE
I STAT EXP(STAT) SD(STAT) Z
1 60.0 66.5000 4.1972 -1.55
2 42.0 24.7917 2.8083 6.13
3 2.0 6.5750 2.1639 -2.11
4 0.0 1.3625 1.1186 -1.22
5 0.0 0.2323 0.4777 -0.49
6 0.0 0.0337 0.1833 -0.18
7 0.0 0.0043 0.0652 -0.07
8 0.0 0.0005 0.0218 -0.02
9 0.0 0.0000 0.0069 -0.01
10 0.0 0.0000 0.0021 0.00
RUNS DOWN
STATISTIC = NUMBER OF RUNS DOWN
OF LENGTH EXACTLY I
I STAT EXP(STAT) SD(STAT) Z
1 25.0 41.7083 6.4900 -2.57
2 35.0 18.2167 3.3444 5.02
3 0.0 5.2125 2.0355 -2.56
4 0.0 1.1302 1.0286 -1.10
5 0.0 0.1986 0.4424 -0.45
6 0.0 0.0294 0.1714 -0.17
7 0.0 0.0038 0.0615 -0.06
8 0.0 0.0004 0.0207 -0.02
9 0.0 0.0000 0.0066 -0.01
10 0.0 0.0000 0.0020 0.00
STATISTIC = NUMBER OF RUNS DOWN
OF LENGTH I OR MORE
I STAT EXP(STAT) SD(STAT) Z
1 60.0 66.5000 4.1972 -1.55
2 35.0 24.7917 2.8083 3.63
3 0.0 6.5750 2.1639 -3.04
4 0.0 1.3625 1.1186 -1.22
5 0.0 0.2323 0.4777 -0.49
6 0.0 0.0337 0.1833 -0.18
7 0.0 0.0043 0.0652 -0.07
8 0.0 0.0005 0.0218 -0.02
9 0.0 0.0000 0.0069 -0.01
10 0.0 0.0000 0.0021 0.00
RUNS TOTAL = RUNS UP + RUNS DOWN
STATISTIC = NUMBER OF RUNS TOTAL
OF LENGTH EXACTLY I
I STAT EXP(STAT) SD(STAT) Z
1 43.0 83.4167 9.1783 -4.40
2 75.0 36.4333 4.7298 8.15
3 2.0 10.4250 2.8786 -2.93
4 0.0 2.2603 1.4547 -1.55
5 0.0 0.3973 0.6257 -0.63
6 0.0 0.0589 0.2424 -0.24
7 0.0 0.0076 0.0869 -0.09
8 0.0 0.0009 0.0293 -0.03
9 0.0 0.0001 0.0093 -0.01
10 0.0 0.0000 0.0028 0.00
STATISTIC = NUMBER OF RUNS TOTAL
OF LENGTH I OR MORE
I STAT EXP(STAT) SD(STAT) Z
1 120.0 133.0000 5.9358 -2.19
2 77.0 49.5833 3.9716 6.90
3 2.0 13.1500 3.0602 -3.64
4 0.0 2.7250 1.5820 -1.72
5 0.0 0.4647 0.6756 -0.69
6 0.0 0.0674 0.2592 -0.26
7 0.0 0.0085 0.0923 -0.09
8 0.0 0.0010 0.0309 -0.03
9 0.0 0.0001 0.0098 -0.01
10 0.0 0.0000 0.0030 0.00
LENGTH OF THE LONGEST RUN UP = 3
LENGTH OF THE LONGEST RUN DOWN = 2
LENGTH OF THE LONGEST RUN UP OR DOWN = 3
NUMBER OF POSITIVE DIFFERENCES = 104
NUMBER OF NEGATIVE DIFFERENCES = 95
NUMBER OF ZERO DIFFERENCES = 0
|
||
| Interpretation of Sample Output |
Scanning the last column labeled "Z", we note that most of the
z-scores for run lengths 1, 2, and 3 have an absolute value greater
than 1.96. This is strong evidence that these data are in fact not
random.
Output from other statistical software may look somewhat different from the above output. |
||
| Question |
The runs test can be used to answer the following question:
|
||
| Importance |
Randomness is one of the key
assumptions in determining
if a univariate statistical process is in control. If
the assumptions of constant location and scale, randomness,
and fixed distribution are reasonable, then the univariate
process can be modeled as:
If the randomness assumption is not valid, then a different model needs to be used. This will typically be either a times series model or a non-linear model (with time as the independent variable). |
||
| Related Techniques |
Autocorrelation Run Sequence Plot Lag Plot |
||
| Case Study | Heat flow meter data | ||
| Software | Most general purpose statistical software programs, including Dataplot, support a runs test. | ||