3.
Production
Process Characterization
3.2. Assumptions / Prerequisites 3.2.3. Analysis of Variance Models (ANOVA)


Description  A oneway layout consists of a single factor with several levels and multiple observations at each level. With this kind of layout we can calculate the mean of the observations within each level of our factor. The residuals will tell us about the variation within each level. We can also average the means of each level to obtain a grand mean. We can then look at the deviation of the mean of each level from the grand mean to understand something about the level effects. Finally, we can compare the variation within levels to the variation across levels. Hence the name analysis of variance.  
Model 
It is easy to model all of this with an equation of the form:
\( y_{ij} = m + a_{i} + \epsilon_{ij} \) The equation indicates that the jth data value, from level i, is the sum of three components: the common value (grand mean), the level effect (the deviation of each level mean from the grand mean), and the residual (what's left over). 

Estimation  Estimation for the oneway layout can be performed one of two ways. First, we can calculate the total variation, withinlevel variation and acrosslevel variation. These can be summarized in a table as shown below and tests can be made to determine if the factor levels are significant. The value splitting example illustrates the calculations involved.  
ANOVA table for oneway case 
In general, the ANOVA table for the oneway case is given by:
where 

Level effects must sum to zero  The second way to estimate effects is through the use of CLM techniques. If you look at the model above you will notice that it is in the form of a CLM. The only problem is that the model is saturated and no unique solution exists. We overcome this problem by applying a constraint to the model. Since the level effects are just deviations from the grand mean, they must sum to zero. By applying the constraint that the level effects must sum to zero, we can now obtain a unique solution to the CLM equations. Most analysis programs will handle this for you automatically. See the chapter on Process Modeling for a more complete discussion on estimating the coefficients for these models.  
Testing  We are testing to see if the observed data support the hypothesis that the levels of the factor are significantly different from each other. The way we do this is by comparing the withinlevel variancs to the betweenlevel variance.  
If we assume that the observations within each level have the same variance, we can calculate the variance within each level and pool these together to obtain an estimate of the overall population variance. This works out to be the mean square of the residuals.  
Similarly, if there really were no level effect, the mean square across levels would be an estimate of the overall variance. Therefore, if there really were no level effect, these two estimates would be just two different ways to estimate the same parameter and should be close numerically. However, if there is a level effect, the level mean square will be higher than the residual mean square.  
It can be shown that given the assumptions about the data stated below, the ratio of the level mean square and the residual mean square follows an F distribution with degrees of freedom as shown in the ANOVA table. If the F_{0} value is significant at a given significance level (greater than the cutoff value in a F table), then there is a level effect present in the data.  
Assumptions  For estimation purposes, we assume the data can adequately be modeled as the sum of a deterministic component and a random component. We further assume that the fixed (deterministic) component can be modeled as the sum of an overall mean and some contribution from the factor level. Finally, it is assumed that the random component can be modeled with a Gaussian distribution with fixed location and spread.  
Uses  The oneway ANOVA is useful when we want to compare the effect of multiple levels of one factor and we have multiple observations at each level. The factor can be either discrete (different machine, different plants, different shifts, etc.) or continuous (different gas flows, temperatures, etc.).  
Example 
Let's extend the machining example
by assuming that we have five different machines making the same
part and we take five random samples from each machine to obtain the
following diameter data:


Analyze  Using ANOVA software or the techniques of the valuesplitting example, we summarize the data in an ANOVA table as follows:  


Test  By dividing the factorlevel mean square by the residual mean square, we obtain an F_{0} value of 4.86 which is greater than the cutoff value of 2.87 from the F distribution with 4 and 20 degrees of freedom and a significance level of 0.05. Therefore, there is sufficient evidence to reject the hypothesis that the levels are all the same.  
Conclusion  From the analysis of these data we can conclude that the factor "machine" has an effect. There is a statistically significant difference in the pin diameters across the machines on which they were manufactured. 