|
1.
Exploratory Data Analysis
1.3. EDA Techniques 1.3.5. Quantitative Techniques
|
|||
|
Purpose: Test for Equal Means Across Groups |
One factor analysis of variance
(Snedecor and Cochran,
1989) is a special case of
analysis of variance
(ANOVA), for one
factor of interest, and a generalization of the
two-sample t-test. The two-sample
t-test is used to decide whether two groups (levels) of a
factor have the same mean. One-way analysis of variance
generalizes this to levels where k, the number of levels,
is greater than or equal to 2.
For example, data collected on, say, five instruments have one factor (instruments) at five levels. The ANOVA tests whether instruments have a significant effect on the results. |
||
| Definition |
The Product and Process
Comparisons chapter (chapter 7) contains
a more extensive discussion of
1-factor ANOVA,
including the details for the mathematical computations of
one-way analysis of variance.
The model for the analysis of variance can be stated in two mathematically equivalent ways. In the following discussion, each level of each factor is called a cell. For the one-way case, a cell and a level are equivalent since there is only one factor. In the following, the subscript i refers to the level and the subscript j refers to the observation within a level. For example, Y23 refers to the third observation in the second level. The first model is
The second model is
The distinction between these models is that the second model divides the cell mean into an overall mean and the effect of the ith factor level. This second model makes the factor effect more explicit, so we will emphasize this approach. |
||
| Model Validation | Note that the ANOVA model assumes that the error term, Eij, should follow the assumptions for a univariate measurement process. That is, after performing an analysis of variance, the model should be validated by analyzing the residuals. | ||
|
Sample Output |
Dataplot generated the following output for the one-way analysis of
variance from the GEAR.DAT data set.
NUMBER OF OBSERVATIONS = 100
NUMBER OF FACTORS = 1
NUMBER OF LEVELS FOR FACTOR 1 = 10
BALANCED CASE
RESIDUAL STANDARD DEVIATION = 0.59385783970E-02
RESIDUAL DEGREES OF FREEDOM = 90
REPLICATION CASE
REPLICATION STANDARD DEVIATION = 0.59385774657E-02
REPLICATION DEGREES OF FREEDOM = 90
NUMBER OF DISTINCT CELLS = 10
*****************
* ANOVA TABLE *
*****************
SOURCE DF SUM OF SQUARES MEAN SQUARE F STATISTIC F CDF SIG
-------------------------------------------------------------------------------
TOTAL (CORRECTED) 99 0.003903 0.000039
-------------------------------------------------------------------------------
FACTOR 1 9 0.000729 0.000081 2.2969 97.734% *
-------------------------------------------------------------------------------
RESIDUAL 90 0.003174 0.000035
RESIDUAL STANDARD DEVIATION = 0.00593857840
RESIDUAL DEGREES OF FREEDOM = 90
REPLICATION STANDARD DEVIATION = 0.00593857747
REPLICATION DEGREES OF FREEDOM = 90
****************
* ESTIMATION *
****************
GRAND MEAN = 0.99764001369E+00
GRAND STANDARD DEVIATION = 0.62789078802E-02
LEVEL-ID NI MEAN EFFECT SD(EFFECT)
--------------------------------------------------------------------
FACTOR 1-- 1.00000 10. 0.99800 0.00036 0.00178
-- 2.00000 10. 0.99910 0.00146 0.00178
-- 3.00000 10. 0.99540 -0.00224 0.00178
-- 4.00000 10. 0.99820 0.00056 0.00178
-- 5.00000 10. 0.99190 -0.00574 0.00178
-- 6.00000 10. 0.99880 0.00116 0.00178
-- 7.00000 10. 1.00150 0.00386 0.00178
-- 8.00000 10. 1.00040 0.00276 0.00178
-- 9.00000 10. 0.99830 0.00066 0.00178
-- 10.00000 10. 0.99480 -0.00284 0.00178
MODEL RESIDUAL STANDARD DEVIATION
-------------------------------------------------------
CONSTANT ONLY-- 0.0062789079
CONSTANT & FACTOR 1 ONLY-- 0.0059385784
|
| Interpretation of Sample Output |
The output is divided into three sections.
In addition to the quantitative ANOVA output, it is recommended that any analysis of variance be complemented with model validation. At a minimum, this should include
|
| Question |
The analysis of variance can be used to answer the following
question
|
| Importance | The analysis of uncertainty depends on whether the factor significantly affects the outcome. |
| Related Techniques |
Two-sample t-test Multi-factor analysis of variance Regression Box plot |
| Software | Most general purpose statistical software programs, including Dataplot, can generate an analysis of variance. |