SED navigation bar go to SED home page go to Dataplot home page go to NIST home page SED Home Page SED Contacts SED Projects SED Products and Publications Search SED Pages
Dataplot Vol 1 Auxiliary Chapter

GRUBBS TEST

Name:
    GRUBBS TEST
Type:
    Analysis Command
Purpose:
    Perform a Grubbs test for outliers.
Description:
    The Grubbs test, also know as the maximum normalized residual test, can be used to test for outliers in a univariate data set. Note that this test assumes normality, so you test the data for normality before applying the Grubbs test.

    Grubbs test detects one outlier at a time. For multiple outliers, delete the single outlier detected and run the Grubbs test. Repeat this process until no outliers are detected.

    More formally, the Grubbs test can be defined as follows.
    H0: There are no outliers in the data.
    Ha: There is at least one outlier in the data.
    Test Statistic: G = MAX(ABS(Y(i) - YBAR))/s

    where YBAR and s are the sample mean and standard deviation of the data. That is, the Grubbs test statistic is the largest absolute deviation from the sample mean in units of the sample standard deviation.

    Significance
    Level:
    alpha
    Critical
    Region
    The hypothesis of no outliers is rejected if

    G > [(N-1)/SQRT(N)]*SQRT[(t(1-alpha/2(2*N),N-2)**2/ (N - 2 + t(1-alpha/(2*N),N-2)**2)]

    where t is the critical value of the t distribution.

    Note that the above is actually a combination of the following two tests:

    1. the test that the minimum value is an outlier.

    2. the test that the maximum value is an outlier.

    To generate these one-sided tests, the test statistic is

      G = (YBAR - Ymin)/s

    or

      G = (Ymax - YBAR)/s

    The significance level in the TPPF function needs to be doubled for the one-sided tests.

    You can request that one of the one-sided tests be performed (see the Syntax section).

    Generally, graphical methods such as the box plot or histogram are used to detect outliers. However, the Grubbs test can be used if you prefer a more formal test.

Syntax 1:
    GRUBBS TEST <y>             <SUBSET/EXCEPT/FOR qualification>
    where <y> is the response variable being tested;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    This syntax performs the two-sided test.

Syntax 2:
    GRUBBS MINIMUM TEST <y>             <SUBSET/EXCEPT/FOR qualification>
    where <y> is the response variable being tested;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    This syntax performs the one-sided test for the minimum value.

Syntax 3:
    GRUBBS MAXIMUM TEST <y>             <SUBSET/EXCEPT/FOR qualification>
    where <y> is the response variable being tested;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    This syntax performs the one-sided test for the maximum value.

Examples:
    GRUBBS TEST Y1
    GRUBBS TEST Y1 SUBSET TAG > 2
    GRUBBS MINIMUM TEST Y1
    GRUBBS MAXIMUM TEST Y1
Default:
    None
Synonyms:
    None
Related Commands:
    HISTOGRAM = Generate a histogram.
    PROBABILITY PLOT = Generates a probability plot.
    BOX PLOT = Generate a box plot.
    WILK SHAPIRO TEST = Compute the Wilk-Shapiro test for normality.
    ANDERSON DARLING TEST = Compute the Anderson-Darling test for normality.
Reference:
    "Statistical Methods", Eighth Edition, Snedecor and Cochran, Iowa State University Press, 1989, pp. 278-280.
Applications:
    Outlier Detection
Implementation Date:
    1998/5  
    2005/5: Corrected the significance levels for the two-sided case (previous version was actually using the significance level for the one=sided case)
    2005/5: Added support for the one-sided tests
    2006/3: Replaced 2005/5 update with Syntax 2 and Syntax 3
Program:
    SKIP 25
    READ VANGEL31.DAT Y
    GRUBBS TEST Y

    The following output is generated:

           *************************
           **      GRUBBS TEST Y  **
           *************************
      
                   GRUBB TEST FOR OUTLIERS
                   (ASSUMPTIION: NORMALITY)
      
     1. STATISTICS:
           NUMBER OF OBSERVATIONS      =       38
           MINIMUM                     =    147.0000
           MEAN                        =    185.7895
           MAXIMUM                     =    231.0000
        STANDARD DEVIATION          =    18.59549
      
        GRUBB TEST STATISTIC       =    2.431263
      
     2. PERCENT POINTS OF THE REFERENCE DISTRIBUTION
        FOR GRUBB TEST STATISTIC
           0          % POINT    =    .0000000
           50         % POINT    =    2.393112
           75         % POINT    =    2.600730
           90         % POINT    =    2.846334
           95         % POINT    =    3.014101
           99         % POINT    =    3.356029
      
              55.75001      % POINT:        2.431263
      
     3. CONCLUSION (AT THE 5% LEVEL):
           THERE ARE NO OUTLIERS.
        

Date created: 6/5/2001
Last updated: 4/17/2006
Please email comments on this WWW page to alan.heckert@cnist.gov.