NIST StRD Data Archive
Additional Information



Procedure: Analysis of Variance
Certification Method & Definitions

Data: 1 Factor
9 Treatments
2001 Replicates/Cell
18009 Observations
13 Constant Leading Digits
Higher Level of Difficulty
Generated Data

Model: 10 Parameters ((mu, tau(1),...,tau(9)))
y(ij) = mu + tau(i) + e(ij)

This dataset was generated to test two specific computational aspects of ANOVA software. The generated data is based on work published by Stephen Simon and James Lesage. They identified two types of error that can particularly affect ANOVA computations, cancellation error and accumulation error. Software that is not written to control these errors can produce inaccurate output which can sometimes be serious enough to change the qualitative conclusions of the data analysis.

Even though the primary focus of these datasets is identification of problems caused by cancellation or accumulation error, these datasets may also identify errors in other aspects of ANOVA software such as computation of degrees of freedom or errors in the ANOVA table from other sources.

The formula used to generate this data is:

y(ij) = 10^12 + theta(i) + theta(j), i=1,...,9,  j=1,...,2001
theta(k) = {0.2 if k = 1,  0.1 if k = 2,4,6,...,2000,  0.3 if k = 3,5,7,...,2001.