|
1.
Exploratory Data Analysis
1.4. EDA Case Studies 1.4.2. Case Studies 1.4.2.10. Ceramic Strength
|
|||
| Batch is a Nuisance Factor |
The two nuisance factors in this experiment are the batch number
and the lab. There are 2 batches and 8 labs. Ideally, these
factors will have minimal effect on the response variable.
We will investigate the batch factor first. |
||
| Bihistogram |
This bihistogram shows the following.
Although we could stop with the bihistogram, we will show a few other commonly used two-sample graphical techniques for comparison. |
||
| Quantile-Quantile Plot |
This q-q plot shows the following.
|
||
| Box Plot |
This box plot shows the following.
|
||
| Block Plots |
A block plot is generated for each of the eight labs, with "1" and
"2" denoting the batch numbers. In the first
plot, we do not include any of the primary factors. The next 3
block plots include one of the primary factors. Note that each of
the 3 primary factors (table speed = X1, down feed rate = X2,
wheel grit size = X3) has 2 levels. With 8 labs and 2 levels for
the primary factor, we would expect 16 separate blocks on these
plots. The fact that some of these blocks are missing indicates
that some of the combinations of lab and primary factor are empty.
These block plots show the following.
|
||
| Quantitative Techniques | We can confirm some of the conclusions drawn from the above graphics by using quantitative techniques. The two sample t-test can be used to test whether or not the means from the two batches are equal and the F-test can be used to test whether or not the standard deviations from the two batches are equal. | ||
| Two Sample T-Test |
The following is the Dataplot output from the two sample t-test.
T-TEST
(2-SAMPLE)
NULL HYPOTHESIS UNDER TEST--POPULATION MEANS MU1 = MU2
SAMPLE 1:
NUMBER OF OBSERVATIONS = 240
MEAN = 688.9987
STANDARD DEVIATION = 65.54909
STANDARD DEVIATION OF MEAN = 4.231175
SAMPLE 2:
NUMBER OF OBSERVATIONS = 240
MEAN = 611.1559
STANDARD DEVIATION = 61.85425
STANDARD DEVIATION OF MEAN = 3.992675
IF ASSUME SIGMA1 = SIGMA2:
POOLED STANDARD DEVIATION = 63.72845
DIFFERENCE (DELTA) IN MEANS = 77.84271
STANDARD DEVIATION OF DELTA = 5.817585
T-TEST STATISTIC VALUE = 13.38059
DEGREES OF FREEDOM = 478.0000
T-TEST STATISTIC CDF VALUE = 1.000000
IF NOT ASSUME SIGMA1 = SIGMA2:
STANDARD DEVIATION SAMPLE 1 = 65.54909
STANDARD DEVIATION SAMPLE 2 = 61.85425
BARTLETT CDF VALUE = 0.629618
DIFFERENCE (DELTA) IN MEANS = 77.84271
STANDARD DEVIATION OF DELTA = 5.817585
T-TEST STATISTIC VALUE = 13.38059
EQUIVALENT DEG. OF FREEDOM = 476.3999
T-TEST STATISTIC CDF VALUE = 1.000000
ALTERNATIVE- ALTERNATIVE-
ALTERNATIVE- HYPOTHESIS HYPOTHESIS
HYPOTHESIS ACCEPTANCE INTERVAL CONCLUSION
MU1 <> MU2 (0,0.025) (0.975,1) ACCEPT
MU1 < MU2 (0,0.05) REJECT
MU1 > MU2 (0.95,1) ACCEPT
The t-test indicates that the mean for batch 1 is larger than the
mean for batch 2 (at the 5% confidence level).
|
||
| F-Test |
The following is the Dataplot output from the F-test.
F-TEST
NULL HYPOTHESIS UNDER TEST--SIGMA1 = SIGMA2
ALTERNATIVE HYPOTHESIS UNDER TEST--SIGMA1 NOT EQUAL SIGMA2
SAMPLE 1:
NUMBER OF OBSERVATIONS = 240
MEAN = 688.9987
STANDARD DEVIATION = 65.54909
SAMPLE 2:
NUMBER OF OBSERVATIONS = 240
MEAN = 611.1559
STANDARD DEVIATION = 61.85425
TEST:
STANDARD DEV. (NUMERATOR) = 65.54909
STANDARD DEV. (DENOMINATOR) = 61.85425
F-TEST STATISTIC VALUE = 1.123037
DEG. OF FREEDOM (NUMER.) = 239.0000
DEG. OF FREEDOM (DENOM.) = 239.0000
F-TEST STATISTIC CDF VALUE = 0.814808
NULL NULL HYPOTHESIS NULL HYPOTHESIS
HYPOTHESIS ACCEPTANCE INTERVAL CONCLUSION
SIGMA1 = SIGMA2 (0.000,0.950) ACCEPT
The F-test indicates that the standard deviations for the two
batches are not significantly different at the 5% confidence level.
|
||
| Conclusions |
We can draw the following conclusions from the above analysis.
This batch effect was completely unexpected by the scientific investigators in this study. Note that although the quantitative techniques support the conclusions of unequal means and equal standard deviations, they do not show the more subtle features of the data such as the presence of outliers and the skewness of the batch 2 data. |
||