KOLMOGOROV SMIRNOV GOODNESS OF FIT TEST
Name:
Type:
Purpose:
Perform a KolmogorovSmirnov goodness of fit test that a set of
data come from a hypothesized continuouis distributuion. Dataplot
currently supports the KolmogorovSmirnov goodness of fit test
for 60+ distributions.
Description:
The KolmogorovSmirnov (KS) test is based on the empirical
distribution function (ECDF). Given N data points
Y_{1} Y_{2} ..., Y_{n} the ECDF is
defined as
where n(i) is the number of points less than Y_{i} This is a
step function that increases by 1/N at the value of each data point.
We can graph a plot of the empirical distribution function with
a cumulative distribution function for a given distribution. The
KS test is based on the maximum distance between these two curves.
An example of this plot for a sample of 100 normal random numbers
is given here.
An attractive feature of this test is that
the distribution of the KS test statistic itself does not
depend on the underlying cumulative distribution function being
tested. Another advantage is that it is an exact test (the
chisquare goodness of fit depends on an adequate sample size
for the approximations to be valid). Despite these advantages,
the KS test has several important limitations:
 It only applies to continuous distributions.
 It tends to be more sensitive near the center of the
distribution than it is at the tails.
 Perhaps the most serious limitation is that the
distribution must be fully specified. That is, if
location, scale, and shape parameters are estimated
from the data, the critical region of the KS test
is no longer valid. It typically must be determined by
simulation.
 The KS test is only valid for continuous distributions.
Due to limitations 2 and 3 above, many analysts prefer to
use the AndersonDarling goodness of fit test. However, the
AndersonDarling test is only available for a few specific
distributions. In addition, the AndersonDarling test is
more powerful than the KS test since it makes specific use
of the underlying cumulative distribution.
More formally, the KolmogorovSmirnov goodness of fit
test statistic can be defined as follows.
H_{0}:

The data follow the specified distribution.

H_{a}:

The data do not follow the specified distribution.

Test
Statistic:

The KolmogorovSmirnov goodness of fit test statistic is
defined as
where F is the theoretical cumulative
distribution of the distribution being tested.

Significance Level:


Critical Region:

The hypothesis regarding the distributional form is
rejected if the test statistic, D, is greater than
the critical value obtained from a table. There
are several variations of these tables in the
literature that use somewhat different scalings
for the KS test statistic and critical regions.
These alternative formulations should be equivalent,
but it is necessary to ensure that the test statistic
is calculated in a way that is consistent with how
the critical values were tabulated.
Dataplot uses the critical values from
Chakravart, Laha, and Roy (see Reference: below).

In order to apply the KS goodness of fit test, any shape
parameters must be specified. For example,
LET GAMMA = 5.3
WEIBULL KOLMOGOROVSMIRNOV GOODNESS OF FIT TEST Y
The name of the distributional parameter for families is given in
the list below.
Location and scale parameters can be specified generically with
the following commands:
LET KSLOC = <value>
LET KSSCALE = <value>
The location and scale parameters default to 0 and 1 if not
specified.
Syntax:
<dist> KOLMOGOROV SMIRNOV GOODNESS OF FIT TEST <y>
<SUBSET/EXCEPT/FOR/qualification>
where <y> is a response variable;
<dist> is one of the following distributions:
 UNIFORM
 SEMICIRCULAR
 TRIANGULAR
 NORMAL
 LOGISTIC
 DOUBLE EXPONENTIAL
 CAUCHY
 TUKEY LAMBDA (LAMBDA)
 LOGNORMAL (SD, optional, defaults to 1)
 HALFNORMAL
 T (NU)
 CHISQUARED (NU)
 F (NU1, NU2)
 EXPONENTIAL
 GAMMA (GAMMA)
 BETA (ALPHA, BETA)
 WEIBULL (GAMMA)
 EXTREME VALUE TYPE 1
 EXTREME VALUE TYPE 2 (GAMMA)
 PARETO (GAMMA)
 WALD (GAMMA)
 INVERSE GAUSSIAN (GAMMA)
 RIG (GAMMA)
 FL (GAMMA)
 NONCENTRAL BETA (ALPHA, BETA, LAMBDA)
 NONCENTRAL CHISQUARE (NU, LAMBDA)
 NONCENTRAL F (NU1, NU2, LAMBDA)
 DOUBLY NONCENTRAL F (NU1, NU2, LAMBDA1, LAMBDA2)
 NONCENTRAL T (NU, LAMBDA)
 DOUBLY NONCENTRAL T (NU, LAMBDA1, LAMBDA2)
 HYPERGEOMETRIC (K, N, M)
 VONMISES (B)
 POWERNORMAL (P, SD)
 POWERLOGNORMAL (P, SD)
 COSINE
 ALPHA (ALPHA, BETA)
 POWER FUNCTION (C)
 CHI (NU)
 LOG LOGISTIC (DELTA)
 GENERALIZED GAMMA (GAMMA, C)
 ANGLIT
 ARCSIN
 HYPERBOLIC SECANT
 HALF CAUCHY
 FOLDED NORMAL (M, SD)
 TRUNCATED NORMAL (A, B, M, SD)
 TRUNCATED EXPONENTIAL (X0, M, SD)
 DOUBLE WEIBULL (GAMMA)
 LOG GAMMA (GAMMA)
 GENERALIZED EXTREME VALUE (GAMMA)
 PARETO SECOND KIND (GAMMA)
 HALF LOGISTIC (GAMMA, optional)
 EXPONENTIATED WEIBULL (GAMMA, THETA)
 GOMPERTZ (C,B)
 WRAPPED CAUCHY (C)
 BRADFORD (ALPHA, BETA)
 DOUBLE GAMMA (GAMMA)
 FOLDED CAUCHY (M, SD)
 GENERALIZED EXPONENTIAL (LAMBDA1, LAMBDA2, S)
 GENERALIZED LOGISTIC (ALPHA)
 MIELKE BETAKAPPA (BETA, THETA, K)
 EXPONENTIAL POWER (ALPHA, BETA)
and where the <SUBSET/EXCEPT/FOR qualification> is optional.
Examples:
NORMAL KOLMOGOROVSMIRNOV GOODNESS OF FIT TEST Y
NORMAL KOLMOGOROVSMIRNOV GOODNESS OF FIT TEST Y SUBSET GROUP > 1
CAUCHY KOLMOGOROVSMIRNOV GOODNESS OF FIT TEST Y
LOGNORMAL KOLMOGOROVSMIRNOV GOODNESS OF FIT TEST X
EXTREME VALUE TYPE 1 KOLMOGOROVSMIRNOV GOODNESS OF FIT TEST X
LET LAMBDA = 0.2
TUKEY LAMBDA KOLMOGOROVSMIRNOV GOODNESS OF FIT TEST X
SET MINMAX = 1
LET GAMMA = 2.0
WEIBULL KOLMOGOROVSMIRNOV GOODNESS OF FIT TEST X
LET LAMBDA = 3
POISSON KOLMOGOROVSMIRNOV GOODNESS OF FIT TEST X
NORMAL KOLMOGOROVSMIRNOV GOODNESS OF FIT TEST Y X
NORMAL KOLMOGOROVSMIRNOV GOODNESS OF FIT TEST Y X1 X2
Note:
There are several approaches for estimating the parameters of a
distribution before applying the goodness of fit test. PPCC plots
combined with probability plots are an effective graphical approach
if there are zero or one shape parameters. Maximum likelihood
estimation is available for several distributions. Least squares
estimation can be applied for distributions for which maximum
likelihood estimation is not available.
Note:
The KOLMOGOROVSMIRNOV GOODNESS OF FIT command automatically saves
the following parameters.
STATVAL  value of the KS goodness of fit statistic
CUTUPP90  90% critical value (alpha = 0.10) for the KS
goodness of fit test statistic
CUTUPP95  95% critical value (alpha = 0.05) for the KS
goodness of fit test statistic
CUTUPP99  99% critical value (alpha = 0.01) for the KS
goodness of fit test statistic
These parameters can be used in subsequent analysis.
Default:
Location and scale parameters default to zero and one. Shape
parameters must be explicitly specified. There is no default
distribution.
Synonyms:
EV2 and FRECHET are synonyms for EXTREME VALUE TYPE 2.
EV1 and GUMBEL are synonyms for EXTREME VALUE TYPE 1.
FATIGUE LIFE is a synonym for FL.
RECIPROCAL INVERSE GAUSSIAN is a synonym for RIG.
IG is a synonym for INVERSE GAUSSIAN.
Related Commands:
ANDERSONDARLING TEST

= Perform AndersonDarling test for goodness of fit.

CHISQUARE TEST

= Perform chisquare test for goodness of fit.

WILKSHAPIRO TEST

= Perform WilkShapiro test for normality.

MAXIMUM LIKELIHOOD

= Perform maximum likelihood estimation for several
distributions.

FIT

= Perform least squares fitting.

PROBABILITY PLOT

= Generates a probability plot.

HISTOGRAM

= Generates a histogram.

PPCC PLOT

= Generates probability plot correlation coefficient plot.

Reference:
"Handbook of Methods of Applied Statistics, Volume I",
Chakravart, Laha, and Roy, John Wiley, 1967, pp. 392394.
Applications:
Implementation Date:
Program:
********************************************************
** normal kolmogorovsmirnov goodness of fit test y **
********************************************************
KOLMOGOROVSMIRNOV GOODNESS OF FIT TEST
NULL HYPOTHESIS H0: DISTRIBUTION FITS THE DATA
ALTERNATE HYPOTHESIS HA: DISTRIBUTION DOES NOT FIT THE DATA
DISTRIBUTION: NORMAL
NUMBER OF OBSERVATIONS = 195
TEST:
KOLMOGOROVSMIRNOV TEST STATISTIC = 0.3249392E01
ALPHA LEVEL CUTOFF CONCLUSION
10% 0.08737 ACCEPT H0
5% 0.09739 ACCEPT H0
1% 0.11673 ACCEPT H0
