1.3.5.15. Chi-Square Goodness-of-Fit Test

1. Exploratory Data Analysis
1.3. EDA Techniques
1.3.5. Quantitative Techniques

1.3.5.15. Chi-Square Goodness-of-Fit Test

Purpose:
Test for distributional adequacy

The chi-square test (Snedecor and Cochran, 1989) is used to test if a sample of data came from a population with a specific distribution.

An attractive feature of the chi-square goodness-of-fit test is that it can be applied to any univariate distribution for which you can calculate the cumulative distribution function. The chi-square goodness-of-fit test is applied to binned data (i.e., data put into classes). This is actually not a restriction since for non-binned data you can simply calculate a histogram or frequency table before generating the chi-square test. However, the value of the chi-square test statistic are dependent on how the data is binned. Another disadvantage of the chi-square test is that it requires a sufficient sample size in order for the chi-square approximation to be valid.

The chi-square test is an alternative to the Anderson-Darling and Kolmogorov-Smirnov goodness-of-fit tests. The chi-square goodness-of-fit test can be applied to discrete distributions such as the binomial and the Poisson. The Kolmogorov-Smirnov and Anderson-Darling tests are restricted to continuous distributions.

Additional discussion of the chi-square goodness-of-fit test is contained in the product and process comparisons chapter (chapter 7).

Definition

The chi-square test is defined for the hypothesis:

H₀:	The data follow a specified distribution.
H_a:	The data do not follow the specified distribution.
Test Statistic:	To compute the test statistic used in the chi-square goodness-of-fit test, one first partitions the range of possible observed values of the quantity of interest, \( Y \), into \( k \) bins determined by \( y_{1} < y_{2} < \dots < y_{k+1} \), such that bin \( i \) has lower endpoint \( y_{i} \) and upper endpoint \( y_{i+1} \) for \( i=1,\dots,k \). Note that \( y_{1} \) can be \( -\infty \) and \( y_{k+1} \) can be \( +\infty \). The test statistic is defined as \( \begin{equation} \chi^{2} = \sum_{i=1}^{k} (O_{i}-E_{i})^{2}/E_{i} \end{equation} \) where \( O_{i} \) denotes the number of observations that fall in bin \( i \) (under some convention for how to count observations that fall on the boundary between two consecutive bins: for example, that an observation equal to \( y_{i} \) is counted as being in bin \( i+1 \)), and \( E_{i} \) denotes the number of observations expected to fall in bin \( i \) based on the probability model under test. If this model is specified in terms of its cumulative distribution function, \( F_{\theta} \), then the expected counts are computed as \( \begin{equation} E_{i} = N (F_{\theta}(y_{i+1})-F_{\theta}(y_{i}). \end{equation} \) where \( N \) denotes the sample size, and under the convention that \( F_{\theta}(-\infty) = 0 \) and \( F_{\theta}(+\infty) = 1 \). Note that, in general, this cumulative distribution function depends on a possibly vectorial parameter, \( \theta \), hence the notation \( F_{\theta} \) in the foregoing equation for \( E_{i} \). For example, if the probability model is Gaussian, then \( \theta = (\mu, \sigma) \), where \( \mu \) and \( \sigma \) denote the mean and standard deviation of the Gaussian distribution. However, if the probability model is Poisson, then \( \theta \) is a scalar, \( \theta = \lambda \), the corresponding mean. In any case, to be able to compute \( E_{i} \) one needs first to estimate \( \theta \). The chi-square test assumes that the estimate of \( \theta \) is the maximum likelihood estimate derived from the observed bin counts, not from the individual observations. Such estimate, which we denote \( \widetilde{\theta} \), can be computed by numerical maximization of the log-likelihood function, \( \ell \), with respect to \( \theta \): \( \begin{equation} \ell(\theta) = \sum_{i=1}^{k} O_{i} \log (F_{\theta}(y_{i+1})-F_{\theta}(y_{i})). \end{equation} \) The expected bin counts are then computed as \( E_{i} = N (F_{\widetilde{\theta}}(y_{i+1})- F_{\widetilde{\theta}}(y_{i}) \) for \( i=1,\dots,k \).
Significance Level:	α
Critical Region:	The test statistic follows, approximately, a chi-square distribution with (k - c) degrees of freedom where k is the number of non-empty cells and c = the number of estimated parameters (including location and scale parameters and shape parameters) for the distribution + 1. For example, for a 3-parameter Weibull distribution, c = 4. Therefore, the hypothesis that the data are from a population with the specified distribution is rejected if \[ \chi^2 > \chi^2_{1-\alpha, \, k-c} \] where \(\chi^2_{1-\alpha, \, k-c}\) is the chi-square critical value with k - c degrees of freedom and significance level α.

Chi-Square Test Example

We generated 1,000 random numbers for normal, double exponential, t with 3 degrees of freedom, and lognormal distributions. In all cases, a chi-square test with k = 32 bins was applied to test for normally distributed data. Because the normal distribution has two parameters, c = 2 + 1 = 3

The normal random numbers were stored in the variable Y1, the double exponential random numbers were stored in the variable Y2, the t random numbers were stored in the variable Y3, and the lognormal random numbers were stored in the variable Y4.

H₀:  the data are normally distributed
H_a:  the data are not normally distributed  

Y1 Test statistic:  Χ² =   32.256
Y2 Test statistic:  Χ² =   91.776
Y3 Test statistic:  Χ² =  101.488
Y4 Test statistic:  Χ² = 1085.104

Significance level:  α = 0.05
Degrees of freedom:  k - c = 32 - 3 = 29
Critical value:  Χ²_1-α, k-c = 42.557
Critical region: Reject H₀ if Χ² > 42.557

As we would hope, the chi-square test fails to reject the null hypothesis for the normally distributed data set and rejects the null hypothesis for the three non-normal data sets.

Application Example

This example uses the data set described in Fatigue Life of Aluminum Alloy Specimens, which comprises 101 measured values of the fatigue life (thousands of cycles until rupture) of rectangular strips of aluminum sheeting that were subjected to periodic loading until failure. To test whether these data are consistent with a 3-parameter Weibull probability model, one can employ the chi-square goodness-of-fit test, after binning the measured values into \( k=10 \) bins chosen so that each is expected to include about 10 observations. The maximum likelihood estimates of the Weibull parameters based on the bin counts are slightly different from their counterparts based on the individual observations. The \( p-value \) corresponding to the test statistic is 0.15, thus not questioning the adequacy of the 3-parameter Weibull distribution as probability model for these data. The R code mentioned below includes an implementation of the test, as just described.

Questions

The chi-square test can be used to answer the following types of questions:

Are the data from a normal distribution?
Are the data from a log-normal distribution?
Are the data from a Weibull distribution?
Are the data from an exponential distribution?
Are the data from a logistic distribution?
Are the data from a binomial distribution?

Importance

Many statistical tests and procedures are based on specific distributional assumptions. The assumption of normality is particularly common in classical statistical tests. Much reliability modeling is based on the assumption that the distribution of the data follows a Weibull distribution.

There are many non-parametric and robust techniques that are not based on strong distributional assumptions. By non-parametric, we mean a technique, such as the sign test, that is not based on a specific distributional assumption. By robust, we mean a statistical technique that performs well under a wide range of distributional assumptions. However, techniques based on specific distributional assumptions are in general more powerful than these non-parametric and robust techniques. By power, we mean the ability to detect a difference when that difference actually exists. Therefore, if the distributional assumption can be confirmed, the parametric techniques are generally preferred.

If you are using a technique that makes a normality (or some other type of distributional) assumption, it is important to confirm that this assumption is in fact justified. If it is, the more powerful parametric techniques can be used. If the distributional assumption is not justified, a non-parametric or robust technique may be required.

Related Techniques

Anderson-Darling Goodness-of-Fit Test
Kolmogorov-Smirnov Test
Shapiro-Wilk Normality Test
Probability Plots
Probability Plot Correlation Coefficient Plot

Software

Some general purpose statistical software programs provide a chi-square goodness-of-fit test for at least some of the common distributions. Both Dataplot code and R code can be used to generate the analyses in this section.