1. Exploratory Data Analysis
1.3. EDA Techniques
1.3.5. Quantitative Techniques

## Anderson-Darling Test

Purpose:
The Anderson-Darling test (Stephens, 1974) is used to test if a sample of data came from a population with a specific distribution. It is a modification of the Kolmogorov-Smirnov (K-S) test and gives more weight to the tails than does the K-S test. The K-S test is distribution free in the sense that the critical values do not depend on the specific distribution being tested (note that this is true only for a fully specified distribution, i.e. the parameters are known). The Anderson-Darling test makes use of the specific distribution in calculating critical values. This has the advantage of allowing a more sensitive test and the disadvantage that critical values must be calculated for each distribution. Currently, tables of critical values are available for the normal, uniform, lognormal, exponential, Weibull, extreme value type I, generalized Pareto, and logistic distributions. We do not provide the tables of critical values in this Handbook (see Stephens 1974, 1976, 1977, and 1979) since this test is usually applied with a statistical software program that will print the relevant critical values.

The Anderson-Darling test is an alternative to the chi-square and Kolmogorov-Smirnov goodness-of-fit tests.

Definition The Anderson-Darling test is defined as:
 H0: The data follow a specified distribution. Ha: The data do not follow the specified distribution Test Statistic: The Anderson-Darling test statistic is defined as $A^{2} = -N - S$ where $S = \sum_{i=1}^{N}\frac{(2i - 1)}{N}[\ln{F(Y_{i})} + \ln{(1 - F(Y_{N+1-i}))}]$ F is the cumulative distribution function of the specified distribution. Note that the Yi are the ordered data. Significance Level: α Critical Region: The critical values for the Anderson-Darling test are dependent on the specific distribution that is being tested. Tabulated values and formulas have been published (Stephens, 1974, 1976, 1977, 1979) for a few specific distributions (normal, lognormal, exponential, Weibull, logistic, extreme value type 1). The test is a one-sided test and the hypothesis that the distribution is of a specific form is rejected if the test statistic, A, is greater than the critical value. Note that for a given distribution, the Anderson-Darling statistic may be multiplied by a constant (which usually depends on the sample size, n). These constants are given in the various papers by Stephens. In the sample output below, the test statistic values are adjusted. Also, be aware that different constants (and therefore critical values) have been published. You just need to be aware of what constant was used for a given set of critical values (the needed constant is typically given with the critical values).
Sample Output
We generated 1,000 random numbers for normal, double exponential, Cauchy, and lognormal distributions. In all four cases, the Anderson-Darling test was applied to test for a normal distribution.

The normal random numbers were stored in the variable Y1, the double exponential random numbers were stored in the variable Y2, the Cauchy random numbers were stored in the variable Y3, and the lognormal random numbers were stored in the variable Y4.


Distribution                 Mean       Standard Deviation
------------               --------     ------------------
Normal (Y1)                0.004360          1.001816
Double Exponential (Y2)    0.020349          1.321627
Cauchy (Y3)                1.503854         35.130590
Lognormal (Y4)             1.518372          1.719969

H0:  the data are normally distributed
Ha:  the data are not normally distributed

Y1 adjusted test statistic:  A2 =   0.2576
Y2 adjusted test statistic:  A2 =   5.8492
Y3 adjusted test statistic:  A2 = 288.7863
Y4 adjusted test statistic:  A2 =  83.3935

Significance level:  α = 0.05
Critical value:  0.752
Critical region:  Reject H0 if A2 > 0.752

When the data were generated using a normal distribution, the test statistic was small and the hypothesis of normality was not rejected. When the data were generated using the double exponential, Cauchy, and lognormal distributions, the test statistics were large, and the hypothesis of an underlying normal distribution was rejected at the 0.05 significance level.
Questions The Anderson-Darling test can be used to answer the following questions:
• Are the data from a normal distribution?
• Are the data from a log-normal distribution?
• Are the data from a Weibull distribution?
• Are the data from an exponential distribution?
• Are the data from a logistic distribution?
Importance Many statistical tests and procedures are based on specific distributional assumptions. The assumption of normality is particularly common in classical statistical tests. Much reliability modeling is based on the assumption that the data follow a Weibull distribution.

There are many non-parametric and robust techniques that do not make strong distributional assumptions. However, techniques based on specific distributional assumptions are in general more powerful than non-parametric and robust techniques. Therefore, if the distributional assumptions can be validated, they are generally preferred.

Related Techniques Chi-Square goodness-of-fit Test
Kolmogorov-Smirnov Test
Shapiro-Wilk Normality Test
Probability Plot
Probability Plot Correlation Coefficient Plot
Case Study Josephson junction cryothermometry case study.
Software The Anderson-Darling goodness-of-fit test is available in some general purpose statistical software programs. Both Dataplot code and R code can be used to generate the analyses in this section.