3. Production Process Characterization
3.2. Assumptions / Prerequisites
3.2.3. Analysis of Variance Models (ANOVA)

One-Way ANOVA

Description A one-way layout consists of a single factor with several levels and multiple observations at each level. With this kind of layout we can calculate the mean of the observations within each level of our factor. The residuals will tell us about the variation within each level. We can also average the means of each level to obtain a grand mean. We can then look at the deviation of the mean of each level from the grand mean to understand something about the level effects. Finally, we can compare the variation within levels to the variation across levels. Hence the name analysis of variance.
Model It is easy to model all of this with an equation of the form:

$$y_{ij} = m + a_{i} + \epsilon_{ij}$$

The equation indicates that the jth data value, from level i, is the sum of three components: the common value (grand mean), the level effect (the deviation of each level mean from the grand mean), and the residual (what's left over).

Estimation Estimation for the one-way layout can be performed one of two ways. First, we can calculate the total variation, within-level variation and across-level variation. These can be summarized in a table as shown below and tests can be made to determine if the factor levels are significant. The value splitting example illustrates the calculations involved.
ANOVA table for one-way case In general, the ANOVA table for the one-way case is given by:

Source Sum of Squares DoF Mean Square F0
Factor $$\small SS_{F} = J \sum{(\bar{y}_{i.} - \bar{y}_{..})^2}$$ I-1 MSF = SSF/(I - 1) MSF/MSE
Residual $$\small SS_{E} = \sum{\sum{(y_{ij} - \bar{y}_{i.})^2}}$$ I(J-1) MSE = SSE/(I(J - 1))
Corr. Total $$\small SST = \sum{\sum{(y_{ij} - \bar{y}_{..})^2}}$$ IJ-1

where

$\bar{y}_{i.} = \frac{1}{J} \sum_{j=1}^{J}{y_{ij}}$
and
$\bar{y}_{..} = \frac{1}{IJ} \sum_{i=1}^{I}{\sum_{j=1}^{J}{y_{ij}}}$
The row labeled, "Corr. Total", in the ANOVA table contains the corrected total sum of squares and the associated degrees of freedom (DoF).
Level effects must sum to zero The second way to estimate effects is through the use of CLM techniques. If you look at the model above you will notice that it is in the form of a CLM. The only problem is that the model is saturated and no unique solution exists. We overcome this problem by applying a constraint to the model. Since the level effects are just deviations from the grand mean, they must sum to zero. By applying the constraint that the level effects must sum to zero, we can now obtain a unique solution to the CLM equations. Most analysis programs will handle this for you automatically. See the chapter on  Process Modeling for a more complete discussion on estimating the coefficients for these models.
Testing We are testing to see if the observed data support the hypothesis that the levels of the factor are significantly different from each other. The way we do this is by comparing the within-level variancs to the between-level variance.
If we assume that the observations within each level have the same variance, we can calculate the variance within each level and pool these together to obtain an estimate of the overall population variance. This works out to be the mean square of the residuals.
Similarly, if there really were no level effect, the mean square across levels would be an estimate of the overall variance. Therefore, if there really were no level effect, these two estimates would be just two different ways to estimate the same parameter and should be close numerically. However, if there is a level effect, the level mean square will be higher than the residual mean square.
It can be shown that given the assumptions about the data stated below, the ratio of the level mean square and the residual mean square follows an F distribution with degrees of freedom as shown in the ANOVA table. If the F0 value is significant at a given significance level (greater than the cut-off value in a F table), then there is a level effect present in the data.
Assumptions For estimation purposes, we assume the data can adequately be modeled as the sum of a deterministic component and a random component. We further assume that the fixed (deterministic) component can be modeled as the sum of an overall mean and some contribution from the factor level. Finally, it is assumed that the random component can be modeled with a Gaussian distribution with fixed location and spread.
Uses The one-way ANOVA is useful when we want to compare the effect of multiple levels of one factor and we have multiple observations at each level. The factor can be either discrete (different machine, different plants, different shifts, etc.) or continuous (different gas flows, temperatures, etc.).
Example Let's extend the machining example by assuming that we have five different machines making the same part and we take five random samples from each machine to obtain the following diameter data:

Machine
1 2 3 4 5
0.125
0.118
0.123
0.126
0.118
0.127
0.122
0.125
0.128
0.129
0.125
0.120
0.125
0.126
0.127
0.126
0.124
0.124
0.127
0.120
0.128
0.119
0.126
0.129
0.121
Analyze Using ANOVA software or the techniques of the value-splitting example, we summarize the data in an ANOVA table as follows:
 F0 Source Sum of Squares Degrees of Freedom Mean Square Factor 0.000137 4 0.000034 4.86 Residual 0.000132 20 0.000007 Corrected Total 0.000269 24
Test By dividing the factor-level mean square by the residual mean square, we obtain an F0 value of 4.86 which is greater than the cut-off value of 2.87 from the F distribution with 4 and 20 degrees of freedom and a significance level of 0.05. Therefore, there is sufficient evidence to reject the hypothesis that the levels are all the same.
Conclusion From the analysis of these data we can conclude that the factor "machine" has an effect. There is a statistically significant difference in the pin diameters across the machines on which they were manufactured.