CHI SQUARE GOODNESS OF FIT TEST
Perform a chi-square goodness of fit test that a set of
data come from a hypothesized distributuion. Dataplot
currently supports the chi-square goodness of fit test
for 70+ distributions.
The basic idea behind the chi-square goodness of fit test
is to divide the range of the data into a number of
intervals. Then the number of points that fall into each
interval is compared to expected number of points for that
interval if the data in fact come from the hypothesized
distribution. More formally, the chi-square goodness of fit
test statistic can be defined as follows.
The data follow the specified distribution.
The data do not follow the specified distribution.
For the chi-square goodness of fit, the data is
divided into k bins and the test statistic is defined as
where Oi is the observed frequency for bin i
and Ei is the expected frequency for bin i. The
expected frequency is calculated by
where F is the cumulative distribution function
for the distribution being tested, Yu is the upper
limit for class i, and Yl is the lower limit for
This test is sensitive to the choice of bins. There
is no optimal choice for the bin width (since the
optimal bin width depends on the distribution).
Most reasonable choices should produce similar, but
not identical, results. Dataplot uses 0.3*s,
where s is the sample standard deviation, for the
class width. The lower and upper bins are at the
sample mean plus and minus 6.0*s respectively. For
the chi-square approximation to be valid, the
expected frequency should be at least 5. This test
is not valid for small samples, and if some of the
counts are less than five, you may need to combine
some bins in the tails.
The test statistic follows, approximately, a
chi-square distribution with (k - c) degrees of
freedom where k is the number of non-empty cells and
c = the number of parameters (including location and
scale parameters and shape parameters) for the
distribution + 1. For example, for a 3-parameter
Weibull distribution, c = 4.
Therefore, the hypothesis that the distribution is
from the specified distribution is rejected if
where is the chi-square percent point function
with k - c degrees of freedom and a significance
level of .
The primary advantage of the chi square goodnes of fit test is that
it is quite general. It can be applied for any distribution, either
discrete or continuous, for which the cumulative distribution function
can be computed. Dataplot supports the chi-square goodness of fit
test for all distributions for which it supports a CDF function.
There are two primary disadvantages:
- The test is sensitive to how the binning of the
data is performed.
- It requires sufficient sample size so that the minimum
expected frequency is five.
In order to apply the chi-square goodness of fit test, any shape
parameters must be specified. For example,
LET GAMMA = 5.3
WEIBULL CHI-SQUARE GOODNESS OF FIT TEST Y
The name of the distributional parameter for families is given in
the list below.
Location and scale parameters can be specified generically with
the following commands:
LET CHSLOC = <value>
LET CHSSCALE = <value>
The location and scale parameters default to 1 if not specified.
Dataplot supports the chi-square goodness of fit test for either
binned or unbinned data.
For unbinned data, Dataplot automatically generates binned data
using the same rule as for histograms. That is, the class width
is 0.3*s where s is the sample standard devition. The upper and
lower limits are the mean plus or minus 6 times the sample
standard deviation (any zero frequency bins in the tails are
omitted). As with the HISTOGRAM command, you can
override these defaults using the CLASS WIDTH, CLASS UPPER,
and CLASS LOWER commands.
Pre-binned data can be specicied in two ways. If your bins are
of equal size, then you specify a single X variable that contains
the mid-points of the bins. If your bins may be of unequal
size, then two X variables are given. The first contains the
lower limit of each bin and the second contains the upper limit
of each bin. Unequal bin sizes usually result from combining
classes with small (less than 5) expected frequency.
<dist> CHI-SQUARE GOODNESS OF FIT TEST <y>
where <y> is a response variable;
<dist> is one of the following distributions:
and where the <SUBSET/EXCEPT/FOR qualification> is optional.
- DOUBLE EXPONENTIAL
- TUKEY LAMBDA (LAMBDA)
- LOGNORMAL (SD, optional, defaults to 1)
- T (NU)
- CHI-SQUARED (NU)
- F (NU1, NU2)
- GAMMA (GAMMA)
- BETA (ALPHA, BETA)
- WEIBULL (GAMMA)
- EXTREME VALUE TYPE 1
- EXTREME VALUE TYPE 2 (GAMMA)
- PARETO (GAMMA)
- BINOMIAL (N, P)
- GEOMETRIC (P)
- POISSON (LAMBDA)
- NEGATIVE BINOMIAL (N, K, P)
- WALD (GAMMA)
- INVERSE GAUSSIAN (GAMMA)
- RIG (GAMMA)
- FL (GAMMA)
- DISCRETE UNIFORM (N)
- NONCENTRAL BETA (ALPHA, BETA, LAMBDA)
- NONCENTRAL CHISQUARE (NU, LAMBDA)
- NONCENTRAL F (NU1, NU2, LAMBDA)
- DOUBLY NONCENTRAL F (NU1, NU2, LAMBDA1, LAMBDA2)
- NONCENTRAL T (NU, LAMBDA)
- DOUBLY NONCENTRAL T (NU, LAMBDA1, LAMBDA2)
- HYPERGEOMETRIC (K, N, M)
- VON-MISES (B)
- POWER-NORMAL (P, SD)
- POWER-LOGNORMAL (P, SD)
- ALPHA (ALPHA, BETA)
- POWER FUNCTION (C)
- CHI (NU)
- LOGARITHMIC SERIES (THETA)
- LOG LOGISTIC (DELTA)
- GENERALIZED GAMMA (GAMMA, C)
- WARING (A, C, if C omitted, have YULE
- HYPERBOLIC SECANT
- HALF CAUCHY
- FOLDED NORMAL (M, SD)
- TRUNCATED NORMAL (A, B, M, SD)
- TRUNCATED EXPONENTIAL (X0, M, SD)
- DOUBLE WEIBULL (GAMMA)
- LOG GAMMA (GAMMA)
- GENERALIZED EXTREME VALUE (GAMMA)
- PARETO SECOND KIND (GAMMA)
- HALF LOGISTIC (GAMMA, optional)
- EXPONENTIATED WEIBULL (GAMMA, THETA)
- GOMPERTZ (C,B)
- WRAPPED CAUCHY (C)
- BETA BINOMIAL (ALPHA, BETA)
- BRADFORD (ALPHA, BETA)
- DOUBLE GAMMA (GAMMA)
- FOLDED CAUCHY (M, SD)
- GENERALIZED EXPONENTIAL (LAMBDA1, LAMBDA2, S)
- GENERALIZED LOGISTIC (ALPHA)
- MIELKE BETA-KAPPA (BETA, THETA, K)
- EXPONENTIAL POWER (ALPHA, BETA)
This syntax is used for the case where you have unbinned data.
NORMAL CHI-SQUARE GOODNESS OF FIT TEST Y
NORMAL CHI-SQUARE GOODNESS OF FIT TEST Y SUBSET GROUP > 1
CAUCHY CHI-SQUARE GOODNESS OF FIT TEST Y
LOGNORMAL CHI-SQUARE GOODNESS OF FIT TEST X
EXTREME VALUE TYPE 1 CHI-SQUARE GOODNESS OF FIT TEST X
LET LAMBDA = 0.2
TUKEY LAMBDA CHI-SQUARE GOODNESS OF FIT TEST X
SET MINMAX = 1
LET GAMMA = 2.0
WEIBULL CHI-SQUARE GOODNESS OF FIT TEST X
LET LAMBDA = 3
POISSON CHI-SQUARE GOODNESS OF FIT TEST X
NORMAL CHI-SQUARE GOODNESS OF FIT TEST Y X
NORMAL CHI-SQUARE GOODNESS OF FIT TEST Y X1 X2
There are several approaches for estimating the parameters of a
distribution before applying the goodness of fit test. PPCC plots
combined with probability plots are an effective graphical approach
if there are zero or one shape parameters. Maximum likelihood
estimation is available for several distributions. Least squares
estimation can be applied for distributions for which maximum
likelihood estimation is not available.
The bin number, bin mid-point, observed frequency, and expected
frequency are written to the file DPST1F.DAT (dpst1f.dat under
Unix) in the current directory.
The CHI-SQUARE GOODNESS OF FIT command automatically saves the
STATVAL - value of the chi-square goodness of fit statistic
STATNU - degrees of freedom for the chi-square goodness of
STATCDF - cdf value for the chi-square goodness of fit test
CUTUPP90 - 90% critical value (alpha = 0.10) for the chi-square
goodness of fit test statistic
CUTUPP95 - 95% critical value (alpha = 0.05) for the chi-square
goodness of fit test statistic
CUTUPP99 - 99% critical value (alpha = 0.01) for the chi-square
goodness of fit test statistic
These parameters can be used in subsequent analysis.
Location and scale parameters default to zero and one. Shape
parameters must be explicitly specified. There is no default
EV2 and FRECHET are synonyms for EXTREME VALUE TYPE 2.
EV1 and GUMBEL are synonyms for EXTREME VALUE TYPE 1.
FATIGUE LIFE is a synonym for FL.
RECIPROCAL INVERSE GAUSSIAN is a synonym for RIG.
IG is a synonym for INVERSE GAUSSIAN.
The word TEST is optional.
CHI-SQUARE, CHISQUARE, and CHI SQUARE can all be used.
= Perform Anderson-Darling test for goodness of fit.
= Perform Kolmogorov-Smirnov test for goodness of fit.
= Perform Wilk-Shapiro test for normality.
= Perform maximum likelihood estimation for several
= Perform least squares fitting.
= Generates a probability plot.
= Generates a histogram.
= Generates probability plot correlation coefficient plot.
= Specify the class width.
= Specify the upper limit for classes.
= Specify the lower limit for classes.
"Statistical Methods", Eight Edition, Snedecor and Cochran,
Iowa State, 1989, pp. 76-79.
** normal chi-square goodness of fit test y **
CHI-SQUARED GOODNESS OF FIT TEST
NULL HYPOTHESIS H0: DISTRIBUTION FITS THE DATA
ALTERNATE HYPOTHESIS HA: DISTRIBUTION DOES NOT FIT THE DATA
NUMBER OF OBSERVATIONS = 195
NUMBER OF NON-EMPTY CELLS = 20
NUMBER OF PARAMETERS USED = 2
CHI-SQUARED TEST STATISTIC = 5.506083
DEGREES OF FREEDOM = 17
CHI-SQUARED CDF VALUE = 0.004063
ALPHA LEVEL CUTOFF CONCLUSION
10% 24.76903 ACCEPT H0
5% 27.58711 ACCEPT H0
1% 33.40867 ACCEPT H0
CELL NUMBER, BIN MIDPOINT, OBSERVED FREQUENCY, AND
WRITTEN TO FILE DPST1F.DAT
Date created: 6/5/2001
Last updated: 4/4/2003
Please email comments on this WWW page to