Next Page Previous Page Home Tools & Aids Search Handbook
1. Exploratory Data Analysis
1.4. EDA Case Studies
1.4.2. Case Studies
1.4.2.10. Ceramic Strength

1.4.2.10.3.

Analysis of the Batch Effect

Batch is a Nuisance Factor The two nuisance factors in this experiment are the batch number and the lab. There are 2 batches and 8 labs. Ideally, these factors will have minimal effect on the response variable.

We will investigate the batch factor first.

Bihistogram Bihistogram

This bihistogram shows the following.

  1. There does appear to be a batch effect.

  2. The batch 1 responses are centered at 700 while the batch 2 responses are centered at 625. That is, the batch effect is approximately 75 units.

  3. The variability is comparable for the 2 batches.

  4. Batch 1 has some skewness in the lower tail. Batch 2 has some skewness in the center of the distribution, but not as much in the tails compared to batch 1.

  5. Both batches have a few low-lying points.

Although we could stop with the bihistogram, we will show a few other commonly used two-sample graphical techniques for comparison.

Quantile-Quantile Plot Quantile-Quantile Plot

This q-q plot shows the following.

  1. Except for a few points in the right tail, the batch 1 values have higher quantiles than the batch 2 values. This implies that batch 1 has a greater location value than batch 2.

  2. The q-q plot is not linear. This implies that the difference between the batches is not explained simply by a shift in location. That is, the variation and/or skewness varies as well. From the bihistogram, it appears that the skewness in batch 2 is the most likely explanation for the non-linearity in the q-q plot.
Box Plot Box Plot

This box plot shows the following.

  1. The median for batch 1 is approximately 700 while the median for batch 2 is approximately 600.

  2. The spread is reasonably similar for both batches, maybe slightly larger for batch 1.

  3. Both batches have a number of outliers on the low side. Batch 2 also has a few outliers on the high side. Box plots are a particularly effective method for identifying the presence of outliers.
Block Plots A block plot is generated for each of the eight labs, with "1" and "2" denoting the batch numbers. In the first plot, we do not include any of the primary factors. The next 3 block plots include one of the primary factors. Note that each of the 3 primary factors (table speed = X1, down feed rate = X2, wheel grit size = X3) has 2 levels. With 8 labs and 2 levels for the primary factor, we would expect 16 separate blocks on these plots. The fact that some of these blocks are missing indicates that some of the combinations of lab and primary factor are empty.

Block Plots

These block plots show the following.

  1. The mean for batch 1 is greater than the mean for batch 2 in all of the cases above. This is strong evidence that the batch effect is real and consistent across labs and primary factors.
Quantitative Techniques We can confirm some of the conclusions drawn from the above graphics by using quantitative techniques. The two sample t-test can be used to test whether or not the means from the two batches are equal and the F-test can be used to test whether or not the standard deviations from the two batches are equal.
Two Sample T-Test The following is the Dataplot output from the two sample t-test.
                       T-TEST
                     (2-SAMPLE)
 NULL HYPOTHESIS UNDER TEST--POPULATION MEANS MU1 = MU2
  
 SAMPLE 1:
    NUMBER OF OBSERVATIONS      =      240
    MEAN                        =    688.9987
    STANDARD DEVIATION          =    65.54909
    STANDARD DEVIATION OF MEAN  =    4.231175
  
 SAMPLE 2:
    NUMBER OF OBSERVATIONS      =      240
    MEAN                        =    611.1559
    STANDARD DEVIATION          =    61.85425
    STANDARD DEVIATION OF MEAN  =    3.992675
  
 IF     ASSUME SIGMA1 = SIGMA2:
    POOLED STANDARD DEVIATION   =    63.72845
    DIFFERENCE (DELTA) IN MEANS =    77.84271
    STANDARD DEVIATION OF DELTA =    5.817585
    T-TEST STATISTIC VALUE      =    13.38059
    DEGREES OF FREEDOM          =    478.0000
    T-TEST STATISTIC CDF VALUE  =    1.000000
  
 IF NOT ASSUME SIGMA1 = SIGMA2:
    STANDARD DEVIATION SAMPLE 1 =    65.54909
    STANDARD DEVIATION SAMPLE 2 =    61.85425
    BARTLETT CDF VALUE          =    0.629618
    DIFFERENCE (DELTA) IN MEANS =    77.84271
    STANDARD DEVIATION OF DELTA =    5.817585
    T-TEST STATISTIC VALUE      =    13.38059
    EQUIVALENT DEG. OF FREEDOM  =    476.3999
    T-TEST STATISTIC CDF VALUE  =    1.000000
  
                   ALTERNATIVE-         ALTERNATIVE-
 ALTERNATIVE-      HYPOTHESIS           HYPOTHESIS
 HYPOTHESIS        ACCEPTANCE INTERVAL  CONCLUSION
 MU1 <> MU2         (0,0.025) (0.975,1)   ACCEPT
 MU1 < MU2          (0,0.05)              REJECT
 MU1 > MU2          (0.95,1)              ACCEPT
The t-test indicates that the mean for batch 1 is larger than the mean for batch 2 (at the 5% confidence level).
F-Test The following is the Dataplot output from the F-test.
                       F-TEST
 NULL HYPOTHESIS UNDER TEST--SIGMA1 = SIGMA2
 ALTERNATIVE HYPOTHESIS UNDER TEST--SIGMA1 NOT EQUAL SIGMA2
  
 SAMPLE 1:
    NUMBER OF OBSERVATIONS      =      240
    MEAN                        =    688.9987
    STANDARD DEVIATION          =    65.54909
  
 SAMPLE 2:
    NUMBER OF OBSERVATIONS      =      240
    MEAN                        =    611.1559
    STANDARD DEVIATION          =    61.85425
  
 TEST:
    STANDARD DEV. (NUMERATOR)   =    65.54909
    STANDARD DEV. (DENOMINATOR) =    61.85425
    F-TEST STATISTIC VALUE      =    1.123037
    DEG. OF FREEDOM (NUMER.)    =    239.0000
    DEG. OF FREEDOM (DENOM.)    =    239.0000
    F-TEST STATISTIC CDF VALUE  =    0.814808
  
   NULL          NULL HYPOTHESIS        NULL HYPOTHESIS
   HYPOTHESIS    ACCEPTANCE INTERVAL    CONCLUSION
 SIGMA1 = SIGMA2    (0.000,0.950)         ACCEPT
The F-test indicates that the standard deviations for the two batches are not significantly different at the 5% confidence level.
Conclusions We can draw the following conclusions from the above analysis.
  1. There is in fact a significant batch effect. This batch effect is consistent across labs and primary factors.

  2. The magnitude of the difference is on the order of 75 to 100 (with batch 2 being smaller than batch 1). The standard deviations do not appear to be significantly different.

  3. There is some skewness in the batches.

This batch effect was completely unexpected by the scientific investigators in this study.

Note that although the quantitative techniques support the conclusions of unequal means and equal standard deviations, they do not show the more subtle features of the data such as the presence of outliers and the skewness of the batch 2 data.

Home Tools & Aids Search Handbook Previous Page Next Page