KURTOSIS OUTLIER TEST

Name:

KURTOSIS OUTLIER TEST Type:

Analysis Command Purpose:

Perform the kurtosis test for univariate outliers from a normal distribution. Description:

The test statistic is the kurtosis coefficient

$g_{2} = \frac{n (n + 1) \sum_{i = 1}^{n} (x_{i} - \bar{x})^{4}}{(n - 1) (n - 2) (n - 3) s^{4}} - \frac{3 (n - 1)^{2}}{(n - 2) (n - 3)}$

with n, $\bar{x}$ and s denoting the sample size, the sample mean and the sample standard deviation, respectively. Note that this definition is different than the one used by Dataplot's EXCESS KURTOSIS command.

The critical values are obtained via simulation. The ASTM standard provides table values for n = 3 to 50 and $α$ levels of 0.10, 0.05 and 0.01. Linear interpolation is used for values of n not given in the table. Alternatively, you can perform a dynamic simulation to obtain the critical values.

To specify the method used to compute the critical value, enter one of the following commands (the default is ASTM)

If n > 50, the simulation method will be used.

Syntax 1:

Syntax 2:

This syntax performs the kurtosis outlier test on <y1>, then on <y2>, and so on. Up to 30 response variables can be specified.

Note that the syntax

MULTIPLE KURTOSIS OUTLIER TEST Y1 TO Y4

is supported. This is equivalent to

MULTIPLE KURTOSIS OUTLIER TEST Y1 Y2 Y3 Y4

Syntax 3:

This syntax performs a cross-tabulation of <x1> ... <xk> and performs a kurtosis outlier test for each unique combination of cross-tabulated values. For example, if X1 has 3 levels and X2 has 2 levels, there will be a total of 6 kurtosis outlier tests performed.

Up to six group-id variables can be specified.

Note that the syntax

REPLICATED KURTOSIS OUTLIER TEST Y X1 TO X4

is supported. This is equivalent to

REPLICATED KURTOSIS OUTLIER TEST Y X1 X2 X3 X4

Examples:

Note:

Tests for outliers are dependent on knowing the distribution of the data. The kurtosis outlier test assumes that the data come from an approximately normal distribution. For this reason, it is strongly recommended that the kurtosis outlier test be complemented with a normal probability test. If the data are not approximately normally distributed, then the kurtosis outlier test may be detecting the non-normality of the data rather than the presence of an outlier. Note:

SET WRITE DECIMALS <value>

Note:

STATVAL	=	the value of the test statistic
STATDCF	=	the CDF value of the test statistic
PVALUE	=	the p-value of the test statistic
CUTOFF80	=	the 80 percent point of the reference distribution
CUTOFF90	=	the 90 percent point of the reference distribution
CUTOFF95	=	the 95 percent point of the reference distribution
CUTOF975	=	the 97.5 percent point of the reference distribution
CUTOFF99	=	= the 99 percent point of the reference distribution

The STATCDF and PVALUE are only saved when the simulation method is used to obtain critical values. If the ASTM method is used to obtain critical values, the CUTOFF80 and CUTOF975 values are not saved.

If the MULTIPLE or REPLICATED option is used, these values will be written to the file "dpst1f.dat" instead.

Note:

The KURTOSIS OUTLIER TEST, KURTOSIS OUTLIER TEST CDF, and KURTOSIS OUTLIER TEST PVALUE return the values of the test statistic, the cdf of the test statistic and the pvalue of the test statistic, respectively. For the KURTOSIS OUTLIER TEST CDF and KURTOSIS OUTLIER TEST PVALUE commands, the simulation method will be used. Otherwise, the method specified by the SET KURTOSIS OUTLIER TEST CRITICAL VALUE command will be used.

The KURTOSIS OUTLIER TEST INDEX returns the row index of the most extreme value in the response variable. The most extreme value is defined as the value furtherest from the mean.

The KURTOSIS OUTLIER TEST CRITICAL VALUE returns the critical value for the specified value of ALPHA. If ALPHA is not specified, it will be set to 0.05. Note that if the ASTM method is specified for the critical values, only a few select values for alpha are supported (0.01, 0.05 and 0.10).

In addition to the above LET command, built-in statistics are supported for 30+ different commands (enter HELP STATISTICS for details).

Default:

The ASTM method is used to obtain critical values Synonyms:

None Related Commands:

SKEWNESS OUTLIER TEST	=	Perform the skewness outlier test.
DAVID TEST	=	Perform the David outlier test.
GRUBBS TEST	=	Perform the Grubbs outlier test.
TIETJEN-MOORE TEST	=	Perform a Tietjen-Moore outlier test.
EXTREME STUDENTIZED DEVIATE TEST	=	Perform a extreme studentized deviate outlier test.
DIXON TEST	=	Perform a Dixon outlier test.
GOODNESS OF FIT TEST	=	Perform a goodness of fit test (Anderson-Darling, Kolmogorov-Smirnov, chi-square, PPCC)
WILKS SHAPIRO NORMALITY TEST	=	Perform a Wilks Shapiro normality test.
HISTOGRAM	=	Generate a histogram.
PROBABILITY PLOT	=	Generates a probability plot.
BOX PLOT	=	Generate a box plot.

Reference:

ASTM International

Ferguson, T.S. (1961), "On the Rejection of Outliers," Fourth Berkeley Symposium on Mathematical Statistics and Probability, edited by Jerzy Neyman, University of California Press, Berkeley and Los Angeles, CA.

Ferguson, T.S. (1961), "Rules for Rejection of Outliers," Revue Inst. Int. de Stat., RINSA, Vol. 29, No. 3, pp. 29-43.

Applications:

Outlier Detection Implementation Date:

2019/10 Program:

 
. Step 1:   Read the data (from ASTM E-178 document)
.
read y
-1.40
-0.44
-0.30
-0.24
-0.22
-0.13
-0.05
0.06
0.10
0.18
0.20
0.39
0.48
0.63
1.01
end of data
.
. Step 2:   Compute the statistics
.
let stat = kurtosis outlier test y
set kurtosis outlier test critical values astm
let cv1 = kurtosis outlier test critical value y
set kurtosis outlier test critical values simulation
let cv2 = kurtosis outlier test critical value y
.
let pval = kurtosis outlier test pvalue y
let statcdf = kurtosis outlier test cdf y
let iindx = kurtosis outlier test index y
.
set write decimals 3
print stat cv1 cv2 pval statcdf iindx
.
set kurtosis outlier test critical values astm
kurtosis outlier test y
set kurtosis outlier test critical values simulation
kurtosis outlier test y

 PARAMETERS AND CONSTANTS--

    STAT    --          2.529
    CV1     --          2.145
    CV2     --          2.150
    PVAL    --          0.037
    STATCDF --          0.967
    IINDX   --          1.000
 
THE FORTRAN COMMON CHARACTER VARIABLE KURTOUTL HAS JUST BEEN SET TO ASTM
 
            Kurtosis Test for Outliers
             (Assumption: Normality)
 
Response Variable: Y
 
H0: The most extreme point is not
    an outlier
Ha: The most extreme point is not
    an outlier
Potential outlier value tested:                  -1.400
ID for potential outlier:                             1
 
Summary Statistics:
Number of Observations:                              15
Sample Minimum:                                  -1.400
Sample Maximum:                                   1.010
Sample Mean:                                      0.018
Sample SD:                                        0.551
Sample Kurtosis:                                  2.529
 
Kurtosis Outlier Test Statistic Value:            2.529
 
 
Conclusions (Upper 1-Tailed Test)
-------------------------------------------------------------
  Alpha    CDF      Statistic   Critical Value     Conclusion
-------------------------------------------------------------
    10%    90%          2.529            1.422      Reject H0
     5%    95%          2.529            2.145      Reject H0
     1%    99%          2.529            3.887      Accept H0
 
 
 
Critical Values Based on ASTM E-178 Tables
 
 
THE FORTRAN COMMON CHARACTER VARIABLE KURTOUTL HAS JUST BEEN SET TO SIMU
 
            Kurtosis Test for Outliers
             (Assumption: Normality)
 
Response Variable: Y
 
H0: The most extreme point is not
    an outlier
Ha: The most extreme point is not
    an outlier
Potential outlier value tested:                  -1.400
ID for potential outlier:                             1
 
Summary Statistics:
Number of Observations:                              15
Sample Minimum:                                  -1.400
Sample Maximum:                                   1.010
Sample Mean:                                      0.018
Sample SD:                                        0.551
Sample Kurtosis:                                  2.529
 
Kurtosis Outlier Test Statistic Value:            2.529
CDF Value:                                        0.965
P-Value                                           0.035
 
 
 
Conclusions (Upper 1-Tailed Test)
-------------------------------------------------------------
  Alpha    CDF      Statistic   Critical Value     Conclusion
-------------------------------------------------------------
    20%    80%          2.529            0.709      Reject H0
    10%    90%          2.529            1.414      Reject H0
     5%    95%          2.529            2.138      Reject H0
   2.5%  97.5%          2.529            2.886      Accept H0
     1%    99%          2.529            3.969      Accept H0
   0.5%  99.5%          2.529            4.683      Accept H0
 
 
 
Critical Values Based on 50,000 Simulations