1.
Exploratory Data Analysis
1.3. EDA Techniques 1.3.5. Quantitative Techniques


Purpose: Test for Distributional Adequacy 
The AndersonDarling test
(Stephens, 1974)
is used to test if a sample of data came from a population
with a specific distribution. It is a modification of the
KolmogorovSmirnov (KS) test and
gives more weight to the tails than does the KS test. The KS
test is distribution free in the sense that the critical values
do not depend on the specific distribution being tested (note that
this is true only for a fully specified distribution, i.e. the
parameters are known). The AndersonDarling test makes use of the
specific distribution in calculating critical values. This has the
advantage of allowing a more sensitive test and the disadvantage
that critical values must be calculated for each distribution.
Currently, tables of critical values are available for the
normal,
uniform,
lognormal,
exponential,
Weibull,
extreme value type I, generalized Pareto,
and logistic distributions. We do not provide the tables of
critical values in this Handbook (see
Stephens 1974,
1976, 1977, and 1979) since this test is usually
applied with a statistical software program that will print
the relevant critical values.
The AndersonDarling test is an alternative to the chisquare and KolmogorovSmirnov goodnessoffit tests. 

Definition 
The AndersonDarling test is defined as:


Sample Output 
We generated 1,000 random numbers for normal,
double exponential, Cauchy, and lognormal distributions.
In all four cases, the AndersonDarling
test was applied to test for a normal distribution.
The normal random numbers were stored in the variable Y1, the double exponential random numbers were stored in the variable Y2, the Cauchy random numbers were stored in the variable Y3, and the lognormal random numbers were stored in the variable Y4. Distribution Mean Standard Deviation    Normal (Y1) 0.004360 1.001816 Double Exponential (Y2) 0.020349 1.321627 Cauchy (Y3) 1.503854 35.130590 Lognormal (Y4) 1.518372 1.719969 H_{0}: the data are normally distributed H_{a}: the data are not normally distributed Y1 adjusted test statistic: A^{2} = 0.2576 Y2 adjusted test statistic: A^{2} = 5.8492 Y3 adjusted test statistic: A^{2} = 288.7863 Y4 adjusted test statistic: A^{2} = 83.3935 Significance level: α = 0.05 Critical value: 0.752 Critical region: Reject H_{0} if A^{2} > 0.752When the data were generated using a normal distribution, the test statistic was small and the hypothesis of normality was not rejected. When the data were generated using the double exponential, Cauchy, and lognormal distributions, the test statistics were large, and the hypothesis of an underlying normal distribution was rejected at the 0.05 significance level. 

Questions 
The AndersonDarling test can be used to answer the following
questions:


Importance 
Many statistical tests and procedures are based on specific
distributional assumptions. The assumption of normality
is particularly common in classical statistical tests.
Much reliability modeling is based on the assumption
that the data follow a Weibull distribution.
There are many nonparametric and robust techniques that do not make strong distributional assumptions. However, techniques based on specific distributional assumptions are in general more powerful than nonparametric and robust techniques. Therefore, if the distributional assumptions can be validated, they are generally preferred. 

Related Techniques 
ChiSquare goodnessoffit Test KolmogorovSmirnov Test ShapiroWilk Normality Test Probability Plot Probability Plot Correlation Coefficient Plot 

Case Study  Josephson junction cryothermometry case study.  
Software  The AndersonDarling goodnessoffit test is available in some general purpose statistical software programs. Both Dataplot code and R code can be used to generate the analyses in this section. 