SED navigation bar go to SED home page go to Dataplot home page go to NIST home page SED Home Page SED Staff SED Projects SED Products and Publications Search SED Pages
Dataplot Vol 1 Vol 2

SKEWNESS OUTLIER TEST

Name:
    SKEWNESS OUTLIER TEST
Type:
    Analysis Command
Purpose:
    Perform the skewness test for univariate outliers from a normal distribution.
Description:
    The ASTM E178-16a standard for detecting outliers from a univariate normal distribution includes the skewness outlier test.

    The test statistic is the adjusted Fisher-Pearson skewness coefficient

      \[ g_1 = \frac{n \sum_{i=1}^{n}{(x_{i} - \bar{x})^2}} {(n-1) (n-2) s^2} \]

    with n, \( \bar{x} \) and s denoting the sample size, the sample mean and the sample standard deviation, respectively.

    The critical values are obtained via simulation. The ASTM standard provides table values for n = 3 to 50 and \( \alpha \) levels of 0.10, 0.05 and 0.01. Linear interpolation is used for values of n not given in the table. Alternatively, you can perform a dynamic simulation to obtain the critical values.

    To specify the method used to compute the critical value, enter one of the following commands (the default is ASTM)

      SET SKEW OUTLIER TEST CRITICAL VALUES ASTM
      SET SKEW OUTLIER TEST CRITICAL VALUES SIMULATION

    If n > 50, the simulation method will be used.

Syntax 1:
    SKEWNESS OUTLIER TEST <y>
                            <SUBSET/EXCEPT/FOR qualification>
    where <y> is the response variable being tested;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.
Syntax 2:
    MULTIPLE SKEWNESS OUTLIER TEST <y1> ... <yk>
                            <SUBSET/EXCEPT/FOR qualification>
    where <y1> ... <yk> is a list of up to k response variables;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    This syntax performs the skewness outlier test on <y1>, then on <y2>, and so on. Up to 30 response variables can be specified.

    Note that the syntax

      MULTIPLE SKEWNESS OUTLIER TEST Y1 TO Y4

    is supported. This is equivalent to

      MULTIPLE SKEWNESS OUTLIER TEST Y1 Y2 Y3 Y4
Syntax 3:
    REPLICATED SKEWNESS OUTLIER TEST <y> <x1> ... <xk>
                            <SUBSET/EXCEPT/FOR qualification>
    where <y> is the response variable;
                <x1> ... <xk> is a list of up to k group-id variables;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    This syntax performs a cross-tabulation of <x1> ... <xk> and performs a skewness outlier test for each unique combination of cross-tabulated values. For example, if X1 has 3 levels and X2 has 2 levels, there will be a total of 6 skewness outlier tests performed.

    Up to six group-id variables can be specified.

    Note that the syntax

      REPLICATED SKEWNESS OUTLIER TEST Y X1 TO X4

    is supported. This is equivalent to

      REPLICATED SKEWNESS OUTLIER TEST Y X1 X2 X3 X4
Examples:
    SKEWNESS OUTLIER TEST Y1
    MULTIPLE SKEWNESS OUTLIER TEST Y1 Y2 Y3
    REPLICATED SKEWNESS OUTLIER TEST Y X1 X2
    SKEWNESS OUTLIER TEST Y1 SUBSET TAG > 2
Note:
    Tests for outliers are dependent on knowing the distribution of the data. The skewness outlier test assumes that the data come from an approximately normal distribution. For this reason, it is strongly recommended that the skewness outlier test be complemented with a normal probability test. If the data are not approximately normally distributed, then the skewness outlier test may be detecting the non-normality of the data rather than the presence of an outlier.
Note:
    You can specify the number of digits in the skewness outlier test output with the command

      SET WRITE DECIMALS <value>
Note:
    The SKEWNESS OUTLIER TEST command automatically saves the following parameters:

      STATVAL = the value of the test statistic
      STATDCF = the CDF value of the test statistic
      PVALUE = the p-value of the test statistic
      CUTOFF80 = the 80 percent point of the reference distribution
      CUTOFF90 = the 90 percent point of the reference distribution
      CUTOFF95 = the 95 percent point of the reference distribution
      CUTOF975 = the 97.5 percent point of the reference distribution
      CUTOFF99 = = the 99 percent point of the reference distribution

    The STATCDF and PVALUE are only saved when the simulation method is used to obtain critical values. If the ASTM method is used to obtain critical values, the CUTOFF80 and CUTOF975 values are not saved.

    If the MULTIPLE or REPLICATED option is used, these values will be written to the file "dpst1f.dat" instead.

Note:
    In addition to the SKEWNESS OUTLIER TEST command, the following commands can also be used:

      LET A = SKEWNESS OUTLIER TEST Y
      LET A = SKEWNESS OUTLIER TEST CDF Y
      LET A = SKEWNESS OUTLIER TEST PVALUE Y
      LET A = SKEWNESS OUTLIER TEST INDEX Y

      LET ALPHA = <value>
      LET A = SKEWNESS OUTLIER TEST CRITICAL VALUE Y

    The SKEWNESS OUTLIER TEST, SKEWNESS OUTLIER TEST CDF, and SKEWNESS OUTLIER TEST PVALUE return the values of the test statistic, the cdf of the test statistic and the pvalue of the test statistic, respectively. For the SKEWNESS OUTLIER TEST CDF and SKEWNESS OUTLIER TEST PVALUE commands, the simulation method will be used. Otherwise, the method specified by the SET SKEWNESS OUTLIER TEST CRITICAL VALUE command will be used.

    The SKEWNESS OUTLIER TEST INDEX returns the row index of the most extreme value in the response variable. The most extreme value is defined as the value furtherest from the mean.

    The SKEWNESS OUTLIER TEST CRITICAL VALUE returns the critical value for the specified value of ALPHA. If ALPHA is not specified, it will be set to 0.05. Note that if the ASTM method is specified for the critical values, only a few select values for alpha are supported (0.01, 0.05 and 0.10).

    In addition to the above LET command, built-in statistics are supported for 30+ different commands (enter HELP STATISTICS for details).

Default:
    The ASTM method is used to obtain critical values
Synonyms:
    None
Related Commands: Reference:
    E178 - 16A (2016), "Standard Practice for Dealing with Outlying Observations", ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959, USA.

    Ferguson, T.S. (1961), "On the Rejection of Outliers," Fourth Berkeley Symposium on Mathematical Statistics and Probability, edited by Jerzy Neyman, University of California Press, Berkeley and Los Angeles, CA.

    Ferguson, T.S. (1961), "Rules for Rejection of Outliers," Revue Inst. Int. de Stat., RINSA, Vol. 29, No. 3, pp. 29-43.

Applications:
    Outlier Detection
Implementation Date:
    2019/10
Program:
     
    . Step 1:   Read the data (from ASTM E-178 document)
    .
    read y
    3.73
    3.59
    3.94
    4.13
    3.04
    2.22
    3.23
    4.05
    4.11
    2.02
    end of data
    set write decimals 3
    .
    . Step 2:   Compute the statistics
    .
    let stat = skew outlier test y
    set skew outlier test critical values astm
    let cv1 = skew outlier test critical value y
    set skew outlier test critical values simulation
    let cv2 = skew outlier test critical value y
    .
    let pval = skew outlier test pvalue y
    let statcdf = skew outlier test cdf y
    let iindx = skew outlier test index y
    .
    print stat cv1 cv2 pval statcdf iindx
    .
    set skew outlier test critical values astm
    skewness outlier test y
    set skew outlier test critical values simulation
    skewness outlier test y
        
    The following output is generated
     PARAMETERS AND CONSTANTS--
    
        STAT    --         -0.969
        CV1     --         -1.131
        CV2     --         -1.139
        PVAL    --          0.079
        STATCDF --          0.922
        IINDX   --         10.000
     
    THE FORTRAN COMMON CHARACTER VARIABLE SKEWOUTL HAS JUST BEEN SET TO ASTM
     
                Skewness Test for Outliers
                 (Assumption: Normality)
     
    Response Variable: Y
     
    H0: The most extreme point is not
        an outlier
    Ha: The most extreme point is not
        an outlier
    Potential outlier value tested:                   2.020
    ID for potential outlier:                            10
     
    Summary Statistics:
    Number of Observations:                              10
    Sample Minimum:                                   2.020
    Sample Maximum:                                   4.130
    Sample Mean:                                      3.406
    Sample SD:                                        0.771
    Sample Adjusted Skewness:                        -0.969
     
    Skewness Outlier Test Statistic Value:           -0.969
     
     
    Conclusions (Upper 1-Tailed Test)
    -------------------------------------------------------------
      Alpha    CDF      Statistic   Critical Value     Conclusion
    -------------------------------------------------------------
        10%    90%         -0.969           -0.862      Accept H0
         5%    95%         -0.969           -1.131      Reject H0
         1%    99%         -0.969           -1.668      Reject H0
     
     
     
    Critical Values Based on ASTM E-178 Tables
     
     
    THE FORTRAN COMMON CHARACTER VARIABLE SKEWOUTL HAS JUST BEEN SET TO SIMU
     
                Skewness Test for Outliers
                 (Assumption: Normality)
     
    Response Variable: Y
     
    H0: The most extreme point is not
        an outlier
    Ha: The most extreme point is not
        an outlier
    Potential outlier value tested:                   2.020
    ID for potential outlier:                            10
     
    Summary Statistics:
    Number of Observations:                              10
    Sample Minimum:                                   2.020
    Sample Maximum:                                   4.130
    Sample Mean:                                      3.406
    Sample SD:                                        0.771
    Sample Adjusted Skewness:                        -0.969
     
    Skewness Outlier Test Statistic Value:           -0.969
    CDF Value:                                        0.923
    P-Value                                           0.077
     
     
     
    Conclusions (Upper 1-Tailed Test)
    -------------------------------------------------------------
      Alpha    CDF      Statistic   Critical Value     Conclusion
    -------------------------------------------------------------
        20%    80%         -0.969           -0.557      Accept H0
        10%    90%         -0.969           -0.862      Accept H0
         5%    95%         -0.969           -1.133      Reject H0
       2.5%  97.5%         -0.969           -1.385      Reject H0
         1%    99%         -0.969           -1.671      Reject H0
       0.5%  99.5%         -0.969           -1.864      Reject H0
     
     
     
    Critical Values Based on 50,000 Simulations
     
        
Date created: 01/22/2020
Last updated: 12/11/2023

Please email comments on this WWW page to alan.heckert@nist.gov.