|
1.
Exploratory Data Analysis
1.4. EDA Case Studies 1.4.2. Case Studies 1.4.2.7. Standard Resistor
|
|||
| Summary Statistics |
As a first step in the analysis, a table of summary statistics is
computed from the data. The following table, generated by
Dataplot, shows a typical set of
statistics.
SUMMARY
NUMBER OF OBSERVATIONS = 1000
***********************************************************************
* LOCATION MEASURES * DISPERSION MEASURES *
***********************************************************************
* MIDRANGE = 0.2797325E+02 * RANGE = 0.2905006E+00 *
* MEAN = 0.2801634E+02 * STAND. DEV. = 0.6349404E-01 *
* MIDMEAN = 0.2802659E+02 * AV. AB. DEV. = 0.5101655E-01 *
* MEDIAN = 0.2802910E+02 * MINIMUM = 0.2782800E+02 *
* = * LOWER QUART. = 0.2797905E+02 *
* = * LOWER HINGE = 0.2797900E+02 *
* = * UPPER HINGE = 0.2806295E+02 *
* = * UPPER QUART. = 0.2806293E+02 *
* = * MAXIMUM = 0.2811850E+02 *
***********************************************************************
* RANDOMNESS MEASURES * DISTRIBUTIONAL MEASURES *
***********************************************************************
* AUTOCO COEF = 0.9721591E+00 * ST. 3RD MOM. = -0.6936395E+00 *
* = 0.0000000E+00 * ST. 4TH MOM. = 0.2689681E+01 *
* = 0.0000000E+00 * ST. WILK-SHA = -0.4216419E+02 *
* = * UNIFORM PPCC = 0.9689648E+00 *
* = * NORMAL PPCC = 0.9718416E+00 *
* = * TUK -.5 PPCC = 0.7334843E+00 *
* = * CAUCHY PPCC = 0.3347875E+00 *
***********************************************************************
The autocorrelation coefficient of 0.972 is evidence of significant
non-randomness.
|
||
| Location |
One way to quantify a change in location over time is to
fit a straight line to
the data set using the index variable X = 1, 2, ..., N, with N denoting
the number of observations. If there is no significant drift in
the location, the slope parameter estimate should be zero. For this
data set, Dataplot generates the following output:
LEAST SQUARES MULTILINEAR FIT
SAMPLE SIZE N = 1000
NUMBER OF VARIABLES = 1
NO REPLICATION CASE
PARAMETER ESTIMATES (APPROX. ST. DEV.) T VALUE
1 A0 27.9114 (0.1209E-02) 0.2309E+05
2 A1 X 0.209670E-03 (0.2092E-05) 100.2
RESIDUAL STANDARD DEVIATION = 0.1909796E-01
RESIDUAL DEGREES OF FREEDOM = 998
COEF AND SD(COEF) WRITTEN OUT TO FILE DPST1F.DAT
SD(PRED),95LOWER,95UPPER,99LOWER,99UPPER
WRITTEN OUT TO FILE DPST2F.DAT
REGRESSION DIAGNOSTICS WRITTEN OUT TO FILE DPST3F.DAT
PARAMETER VARIANCE-COVARIANCE MATRIX AND
INVERSE OF X-TRANSPOSE X MATRIX
WRITTEN OUT TO FILE DPST4F.DAT
The slope parameter, A1, has a
t value of 100 which
is statistically significant. The value of the slope parameter estimate
is 0.00021. Although this number is nearly zero, we need to take
into account that the original scale of the data is from about
27.8 to 28.2. In this case, we conclude that there is a drift
in location.
|
||
| Variation |
One simple way to detect a change in variation is with a
Bartlett test after dividing the
data set into several equal-sized intervals. However, the Bartlett
test is not robust for non-normality. Since the normality assumption
is questionable for these data,
we use the alternative Levene
test. In partiuclar, we use the Levene test based on the median
rather the mean. The choice of the number of intervals is somewhat
arbitrary, although values of 4 or 8 are reasonable. Dataplot
generated the following output for the Levene test.
LEVENE F-TEST FOR SHIFT IN VARIATION
(ASSUMPTION: NORMALITY)
1. STATISTICS
NUMBER OF OBSERVATIONS = 1000
NUMBER OF GROUPS = 4
LEVENE F TEST STATISTIC = 140.8509
FOR LEVENE TEST STATISTIC
0 % POINT = 0.0000000E+00
50 % POINT = 0.7891988
75 % POINT = 1.371589
90 % POINT = 2.089303
95 % POINT = 2.613852
99 % POINT = 3.801369
99.9 % POINT = 5.463994
100.0000 % Point: 140.8509
3. CONCLUSION (AT THE 5% LEVEL):
THERE IS A SHIFT IN VARIATION.
THUS: NOT HOMOGENEOUS WITH RESPECT TO VARIATION.
In this case, since the Levene test statistic value of 140.9 is
greater than the 5% significance level critical value of 2.6, we
conclude that there is significant evidence of nonconstant variation.
|
||
| Randomness |
There are many ways in which data can be non-random. However,
most common forms of non-randomness can be detected with a
few simple tests. The lag plot in the 4-plot in the previous
section is a simple graphical technique.
One check is an autocorrelation plot that shows the autocorrelations for various lags. Confidence bands can be plotted at the 95% and 99% confidence levels. Points outside this band indicate statistically significant values (lag 0 is always 1). Dataplot generated the following autocorrelation plot.
The lag 1 autocorrelation, which is generally the one of greatest interest, is 0.97. The critical values at the 5% significance level are -0.062 and 0.062. This indicates that the lag 1 autocorrelation is statistically significant, so there is strong evidence of non-randomness.
A common test for
randomness
is the runs test.
|
||
| Distributional Analysis | Since we rejected the randomness assumption, the distributional tests are not meaningful. Therefore, these quantitative tests are omitted. Since the Grubbs' test for outliers also assumes the approximate normality of the data, we omit Grubbs' test as well. | ||
| Univariate Report |
It is sometimes useful and convenient to summarize the above
results in a report.
Analysis for resistor case study
1: Sample Size = 1000
2: Location
Mean = 28.01635
Standard Deviation of Mean = 0.002008
95% Confidence Interval for Mean = (28.0124,28.02029)
Drift with respect to location? = NO
3: Variation
Standard Deviation = 0.063495
95% Confidence Interval for SD = (0.060829,0.066407)
Change in variation?
(based on Levene's test on quarters
of the data) = YES
4: Randomness
Autocorrelation = 0.972158
Data Are Random?
(as measured by autocorrelation) = NO
5: Distribution
Distributional test omitted due to
non-randomness of the data
6: Statistical Control
(i.e., no drift in location or scale,
data are random, distribution is
fixed)
Data Set is in Statistical Control? = NO
7: Outliers?
(Grubbs' test omitted due to
non-randomness of the data
|
||