Next Page Previous Page Home Tools & Aids Search Handbook
1. Exploratory Data Analysis
1.4. EDA Case Studies
1.4.2. Case Studies
1.4.2.7. Standard Resistor

1.4.2.7.3.

Quantitative Output and Interpretation

Summary Statistics As a first step in the analysis, a table of summary statistics is computed from the data. The following table, generated by Dataplot, shows a typical set of statistics.
 
                                SUMMARY
 
                     NUMBER OF OBSERVATIONS =     1000
 
 
***********************************************************************
*        LOCATION MEASURES         *       DISPERSION MEASURES        *
***********************************************************************
*  MIDRANGE     =   0.2797325E+02  *  RANGE        =   0.2905006E+00  *
*  MEAN         =   0.2801634E+02  *  STAND. DEV.  =   0.6349404E-01  *
*  MIDMEAN      =   0.2802659E+02  *  AV. AB. DEV. =   0.5101655E-01  *
*  MEDIAN       =   0.2802910E+02  *  MINIMUM      =   0.2782800E+02  *
*               =                  *  LOWER QUART. =   0.2797905E+02  *
*               =                  *  LOWER HINGE  =   0.2797900E+02  *
*               =                  *  UPPER HINGE  =   0.2806295E+02  *
*               =                  *  UPPER QUART. =   0.2806293E+02  *
*               =                  *  MAXIMUM      =   0.2811850E+02  *
***********************************************************************
*       RANDOMNESS MEASURES        *     DISTRIBUTIONAL MEASURES      *
***********************************************************************
*  AUTOCO COEF  =   0.9721591E+00  *  ST. 3RD MOM. =  -0.6936395E+00  *
*               =   0.0000000E+00  *  ST. 4TH MOM. =   0.2689681E+01  *
*               =   0.0000000E+00  *  ST. WILK-SHA =  -0.4216419E+02  *
*               =                  *  UNIFORM PPCC =   0.9689648E+00  *
*               =                  *  NORMAL  PPCC =   0.9718416E+00  *
*               =                  *  TUK -.5 PPCC =   0.7334843E+00  *
*               =                  *  CAUCHY  PPCC =   0.3347875E+00  *
***********************************************************************
 
The autocorrelation coefficient of 0.972 is evidence of significant non-randomness.
Location One way to quantify a change in location over time is to fit a straight line to the data set using the index variable X = 1, 2, ..., N, with N denoting the number of observations. If there is no significant drift in the location, the slope parameter estimate should be zero. For this data set, Dataplot generates the following output:
 LEAST SQUARES MULTILINEAR FIT
       SAMPLE SIZE N       =     1000
       NUMBER OF VARIABLES =        1
       NO REPLICATION CASE
  
  
               PARAMETER ESTIMATES           (APPROX. ST. DEV.)    T VALUE
        1  A0                   27.9114       (0.1209E-02)       0.2309E+05
        2  A1       X          0.209670E-03   (0.2092E-05)        100.2
  
       RESIDUAL    STANDARD DEVIATION =        0.1909796E-01
       RESIDUAL    DEGREES OF FREEDOM =         998
  
       COEF AND SD(COEF) WRITTEN OUT TO FILE DPST1F.DAT
       SD(PRED),95LOWER,95UPPER,99LOWER,99UPPER
                         WRITTEN OUT TO FILE DPST2F.DAT
       REGRESSION DIAGNOSTICS WRITTEN OUT TO FILE DPST3F.DAT
       PARAMETER VARIANCE-COVARIANCE MATRIX AND
       INVERSE OF X-TRANSPOSE X MATRIX
       WRITTEN OUT TO FILE DPST4F.DAT
The slope parameter, A1, has a t value of 100 which is statistically significant. The value of the slope parameter estimate is 0.00021. Although this number is nearly zero, we need to take into account that the original scale of the data is from about 27.8 to 28.2. In this case, we conclude that there is a drift in location.
Variation One simple way to detect a change in variation is with a Bartlett test after dividing the data set into several equal-sized intervals. However, the Bartlett test is not robust for non-normality. Since the normality assumption is questionable for these data, we use the alternative Levene test. In partiuclar, we use the Levene test based on the median rather the mean. The choice of the number of intervals is somewhat arbitrary, although values of 4 or 8 are reasonable. Dataplot generated the following output for the Levene test.
               LEVENE F-TEST FOR SHIFT IN VARIATION
                      (ASSUMPTION: NORMALITY)
  
 1. STATISTICS
       NUMBER OF OBSERVATIONS    =     1000
       NUMBER OF GROUPS          =        4
       LEVENE F TEST STATISTIC   =    140.8509
  
  
    FOR LEVENE TEST STATISTIC
       0          % POINT    =   0.0000000E+00
       50         % POINT    =   0.7891988
       75         % POINT    =    1.371589
       90         % POINT    =    2.089303
       95         % POINT    =    2.613852
       99         % POINT    =    3.801369
       99.9       % POINT    =    5.463994
  
  
          100.0000       % Point:     140.8509
  
 3. CONCLUSION (AT THE 5% LEVEL):
       THERE IS A SHIFT IN VARIATION.
       THUS: NOT HOMOGENEOUS WITH RESPECT TO VARIATION.
In this case, since the Levene test statistic value of 140.9 is greater than the 5% significance level critical value of 2.6, we conclude that there is significant evidence of nonconstant variation.
Randomness There are many ways in which data can be non-random. However, most common forms of non-randomness can be detected with a few simple tests. The lag plot in the 4-plot in the previous section is a simple graphical technique.

One check is an autocorrelation plot that shows the autocorrelations for various lags. Confidence bands can be plotted at the 95% and 99% confidence levels. Points outside this band indicate statistically significant values (lag 0 is always 1). Dataplot generated the following autocorrelation plot.

autocorrelation plot

The lag 1 autocorrelation, which is generally the one of greatest interest, is 0.97. The critical values at the 5% significance level are -0.062 and 0.062. This indicates that the lag 1 autocorrelation is statistically significant, so there is strong evidence of non-randomness.

A common test for randomness is the runs test.

  
                      RUNS UP
 
           STATISTIC = NUMBER OF RUNS UP
               OF LENGTH EXACTLY I
 
   I         STAT     EXP(STAT)    SD(STAT)       Z
 
   1       178.0    208.3750     14.5453       -2.09
   2        90.0     91.5500      7.5002       -0.21
   3        29.0     26.3236      4.5727        0.59
   4        16.0      5.7333      2.3164        4.43
   5         2.0      1.0121      0.9987        0.99
   6         0.0      0.1507      0.3877       -0.39
   7         0.0      0.0194      0.1394       -0.14
   8         0.0      0.0022      0.0470       -0.05
   9         0.0      0.0002      0.0150       -0.02
  10         0.0      0.0000      0.0046        0.00
 
 
           STATISTIC = NUMBER OF RUNS UP
               OF LENGTH I OR MORE
 
   I         STAT     EXP(STAT)    SD(STAT)       Z
 
   1       315.0    333.1667      9.4195       -1.93
   2       137.0    124.7917      6.2892        1.94
   3        47.0     33.2417      4.8619        2.83
   4        18.0      6.9181      2.5200        4.40
   5         2.0      1.1847      1.0787        0.76
   6         0.0      0.1726      0.4148       -0.42
   7         0.0      0.0219      0.1479       -0.15
   8         0.0      0.0025      0.0496       -0.05
   9         0.0      0.0002      0.0158       -0.02
  10         0.0      0.0000      0.0048        0.00
 
 
                     RUNS DOWN
 
           STATISTIC = NUMBER OF RUNS DOWN
               OF LENGTH EXACTLY I
 
   I         STAT     EXP(STAT)    SD(STAT)       Z
 
   1       195.0    208.3750     14.5453       -0.92
   2        81.0     91.5500      7.5002       -1.41
   3        32.0     26.3236      4.5727        1.24
   4         4.0      5.7333      2.3164       -0.75
   5         1.0      1.0121      0.9987       -0.01
   6         1.0      0.1507      0.3877        2.19
   7         0.0      0.0194      0.1394       -0.14
   8         0.0      0.0022      0.0470       -0.05
   9         0.0      0.0002      0.0150       -0.02
  10         0.0      0.0000      0.0046        0.00
 
 
           STATISTIC = NUMBER OF RUNS DOWN
               OF LENGTH I OR MORE
 
 
   I         STAT     EXP(STAT)    SD(STAT)       Z
 
   1       314.0    333.1667      9.4195       -2.03
   2       119.0    124.7917      6.2892       -0.92
   3        38.0     33.2417      4.8619        0.98
   4         6.0      6.9181      2.5200       -0.36
   5         2.0      1.1847      1.0787        0.76
   6         1.0      0.1726      0.4148        1.99
   7         0.0      0.0219      0.1479       -0.15
   8         0.0      0.0025      0.0496       -0.05
   9         0.0      0.0002      0.0158       -0.02
  10         0.0      0.0000      0.0048        0.00
 
 
           RUNS TOTAL = RUNS UP + RUNS DOWN
 
         STATISTIC = NUMBER OF RUNS TOTAL
              OF LENGTH EXACTLY I
 
   I         STAT     EXP(STAT)    SD(STAT)       Z
 
   1       373.0    416.7500     20.5701       -2.13
   2       171.0    183.1000     10.6068       -1.14
   3        61.0     52.6472      6.4668        1.29
   4        20.0     11.4667      3.2759        2.60
   5         3.0      2.0243      1.4123        0.69
   6         1.0      0.3014      0.5483        1.27
   7         0.0      0.0389      0.1971       -0.20
   8         0.0      0.0044      0.0665       -0.07
   9         0.0      0.0005      0.0212       -0.02
  10         0.0      0.0000      0.0065       -0.01
 
 
         STATISTIC = NUMBER OF RUNS TOTAL
               OF LENGTH I OR MORE
 
   I         STAT     EXP(STAT)    SD(STAT)       Z
 
   1       629.0    666.3333     13.3212       -2.80
   2       256.0    249.5833      8.8942        0.72
   3        85.0     66.4833      6.8758        2.69
   4        24.0     13.8361      3.5639        2.85
   5         4.0      2.3694      1.5256        1.07
   6         1.0      0.3452      0.5866        1.12
   7         0.0      0.0438      0.2092       -0.21
   8         0.0      0.0049      0.0701       -0.07
   9         0.0      0.0005      0.0223       -0.02
  10         0.0      0.0000      0.0067       -0.01
 
 
          LENGTH OF THE LONGEST RUN UP         =     5
          LENGTH OF THE LONGEST RUN DOWN       =     6
          LENGTH OF THE LONGEST RUN UP OR DOWN =     6
 
          NUMBER OF POSITIVE DIFFERENCES =   505
          NUMBER OF NEGATIVE DIFFERENCES =   469
          NUMBER OF ZERO     DIFFERENCES =    25
  
Values in the column labeled "Z" greater than 1.96 or less than -1.96 are statistically significant at the 5% level. Due to the number of values that are larger than the 1.96 cut-off, we conclude that the data are not random. However, in this case the evidence from the runs test is not nearly as strong as it is from the autocorrelation plot.
Distributional Analysis Since we rejected the randomness assumption, the distributional tests are not meaningful. Therefore, these quantitative tests are omitted. Since the Grubbs' test for outliers also assumes the approximate normality of the data, we omit Grubbs' test as well.
Univariate Report It is sometimes useful and convenient to summarize the above results in a report.
  
 Analysis for resistor case study
  
 1: Sample Size                           = 1000
  
 2: Location
    Mean                                  = 28.01635
    Standard Deviation of Mean            = 0.002008
    95% Confidence Interval for Mean      = (28.0124,28.02029)
    Drift with respect to location?       = NO
  
 3: Variation
    Standard Deviation                    = 0.063495
    95% Confidence Interval for SD        = (0.060829,0.066407)
    Change in variation?
    (based on Levene's test on quarters
    of the data)                          = YES
  
 4: Randomness
    Autocorrelation                       = 0.972158
    Data Are Random?
      (as measured by autocorrelation)    = NO
  
 5: Distribution
    Distributional test omitted due to
    non-randomness of the data
  
 6: Statistical Control
    (i.e., no drift in location or scale,
    data are random, distribution is 
    fixed)
    Data Set is in Statistical Control?   = NO
  
 7: Outliers?
    (Grubbs' test omitted due to
    non-randomness of the data
  
Home Tools & Aids Search Handbook Previous Page Next Page