1.
Exploratory Data Analysis
1.3. EDA Techniques 1.3.5. Quantitative Techniques


Purpose: Test for Equal Means Across Groups 
One factor analysis of variance
(Snedecor and Cochran,
1989) is a special case of
analysis of variance
(ANOVA), for one
factor of interest, and a generalization of the
twosample ttest. The twosample
ttest is used to decide whether two groups (levels) of a
factor have the same mean. Oneway analysis of variance
generalizes this to levels where k, the number of levels,
is greater than or equal to 2.
For example, data collected on, say, five instruments have one factor (instruments) at five levels. The ANOVA tests whether instruments have a significant effect on the results. 

Definition 
The Product and Process
Comparisons chapter (chapter 7) contains
a more extensive discussion of
onefactor ANOVA,
including the details for the mathematical computations of
oneway analysis of variance.
The model for the analysis of variance can be stated in two mathematically equivalent ways. In the following discussion, each level of each factor is called a cell. For the oneway case, a cell and a level are equivalent since there is only one factor. In the following, the subscript i refers to the level and the subscript j refers to the observation within a level. For example, Y_{23} refers to the third observation in the second level. The first model is
\( R_{ij} = Y_{ij}  \hat{\mu}_{i} \) The second model is
\( R_{ij} = Y_{ij}  \hat{\mu}  \hat{\alpha}_{i} \) The distinction between these models is that the second model divides the cell mean into an overall mean and the effect of the ith factor level. This second model makes the factor effect more explicit, so we will emphasize this approach. 

Model Validation  Note that the ANOVA model assumes that the error term, E_{ij}, should follow the assumptions for a univariate measurement process. That is, after performing an analysis of variance, the model should be validated by analyzing the residuals. 
OneWay ANOVA Example 
A oneway analysis of variance was generated for the
GEAR.DAT data set. The data set
contains 10 measurements of gear diameter for ten
different batches for a total of 100 measurements.
DEGREES OF SUM OF MEAN SOURCE FREEDOM SQUARES SQUARE F STATISTIC      BATCH 9 0.000729 0.000081 2.2969 RESIDUAL 90 0.003174 0.000035 TOTAL (CORRECTED) 99 0.003903 0.000039 RESIDUAL STANDARD DEVIATION = 0.00594 BATCH N MEAN SD(MEAN)  1 10 0.99800 0.00188 2 10 0.99910 0.00188 3 10 0.99540 0.00188 4 10 0.99820 0.00188 5 10 0.99190 0.00188 6 10 0.99880 0.00188 7 10 1.00150 0.00188 8 10 1.00040 0.00188 9 10 0.99830 0.00188 10 10 0.99480 0.00188The ANOVA table decomposes the variance into the following component sum of squares:
The ANOVA table provides a formal F test for the factor effect. For our example, we are testing the following hypothesis.
H_{0}: All individual batch means are equal. The F statistic is the batch mean square divided by the residual mean square. This statistic follows an F distribution with (k1) and (Nk) degrees of freedom. For our example, the critical F value (upper tail) for α = 0.05, (k1) = 9, and (Nk) = 90 is 1.9856. Since the F statistic, 2.2969, is greater than the critical value, we conclude that there is a significant batch effect at the 0.05 level of significance. Once we have determined that there is a significant batch effect, we might be interested in comparing individual batch means. The batch means and the standard errors of the batch means provide some information about the individual batches. However, we may want to employ multiple comparison methods for a more formal analysis. (See Box, Hunter, and Hunter for more information.) In addition to the quantitative ANOVA output, it is recommended that any analysis of variance be complemented with model validation. At a minimum, this should include:

Question 
The analysis of variance can be used to answer the following
question

Importance  The analysis of uncertainty depends on whether the factor significantly affects the outcome. 
Related Techniques 
Twosample ttest Multifactor analysis of variance Regression Box plot 
Software  Most general purpose statistical software programs can generate an analysis of variance. Both Dataplot code and R code can be used to generate the analyses in this section. 