Dataplot Vol 1 Vol 2

# GRUBBS TEST

Name:
GRUBBS TEST
Type:
Analysis Command
Purpose:
Perform a Grubbs test for outliers.
Description:
The Grubbs test, also know as the maximum normalized residual test, can be used to test for outliers in a univariate data set. Note that this test assumes normality, so you test the data for normality before applying the Grubbs test.

Grubbs test detects one outlier at a time. For multiple outliers, delete the single outlier detected and run the Grubbs test. Repeat this process until no outliers are detected.

More formally, the Grubbs test can be defined as follows.  H0: There are no outliers in the data. Ha: There is at least one outlier in the data. Test Statistic: $$G = \frac{\max(|X_i| - \bar{x})} {s}$$ where $$\bar{X}$$ and s are the sample mean and standard deviation of the data. That is, the Grubbs test statistic is the largest absolute deviation from the sample mean in units of the sample standard deviation. Significance Level: $$\alpha$$ Critical Region The hypothesis of no outliers is rejected if $$G > \frac{N - 1} {\sqrt{N}} \sqrt{\frac{t^2_{(\alpha/(2N),N-2)}} {N - 2 + t^2_{(\alpha/(2N),N-2)}}}$$ where t is the percent point function of the t distribution.

Note that the above is actually a combination of the following two tests:

1. the test that the minimum value is an outlier.

2. the test that the maximum value is an outlier.

To generate these one-sided tests, the test statistic is

$$G = \frac{\bar{Y} - Y_{min}} {s}$$

or

$$G = \frac{Y_{max} - \bar{Y}} {s}$$

The significance level in the TPPF function needs to be doubled for the one-sided tests.

You can request that one of the one-sided tests be performed (see the Syntax section).

Generally, graphical methods such as the box plot or histogram are used to detect outliers. However, the Grubbs test can be used if you prefer a more formal test.

Syntax 1:
GRUBBS TEST <y>             <SUBSET/EXCEPT/FOR qualification>
where <y> is the response variable being tested;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

This syntax performs the two-sided test.

Syntax 2:
GRUBBS MINIMUM TEST <y>             <SUBSET/EXCEPT/FOR qualification>
where <y> is the response variable being tested;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

This syntax performs the one-sided test for the minimum value.

Syntax 3:
GRUBBS MAXIMUM TEST <y>             <SUBSET/EXCEPT/FOR qualification>
where <y> is the response variable being tested;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

This syntax performs the one-sided test for the maximum value.

Syntax 4:
GRUBBS TEST <y> <labid>             <SUBSET/EXCEPT/FOR qualification>
where <y> is the response variable being tested;
<labid> is a variable containing the lab-id corresponding to each value of the response variable;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

This syntax can also be used with the MINIMUM and MAXIMUM version of the tests. The <labid> variable is used to identify the lab-id of the minimum and maximum points. However, it is not used in the computation of the statistic.

Syntax 5:
GRUBBS MULTIPLE TEST <y1> ... <yk>
<SUBSET/EXCEPT/FOR qualification>
where <y1> ... <yk> is a list of up to k response variables;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

This syntax can also be used with the MINIMUM and MAXIMUM version of the tests. This syntax performs a Grubb test on <y1> then on <y2> and so on. Up to 30 response variables can be specified.

Note that the syntax

GRUBB MULTIPLE TEST Y1 TO Y4

is supported. This is equivalent to

GRUBB MULTIPLE TEST Y1 Y2 Y3 Y4
Syntax 6:
GRUBBS REPLICATED TEST <y> <x1> ... <xk>
<SUBSET/EXCEPT/FOR qualification>
where <y1> is the response variable;
<x1> ... <xk> is a list of up to k group-id variables;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

This syntax can also be used with the MINIMUM and MAXIMUM version of the tests. This syntax peforms a cross-tabulation of <x1> ... <xk> and performs a Grubbs test for each unique combination of cross-tabulated values. For example, if X1 has 3 levels and X2 has 2 levels, there will be a total of 6 Grubbs tests performed.

Up to six group-id variables can be specified.

Note that the syntax

GRUBB REPLICATED TEST Y X1 TO X4

is supported. This is equivalent to

GRUBB REPLICATED TEST Y X1 X2 X3 X4
Examples:
GRUBBS TEST Y1
GRUBBS TEST Y1 LABID
GRUBBS MULTIPLE TEST Y1 Y2 Y3
GRUBBS REPLICATED TEST Y X1 X2
GRUBBS TEST Y1 SUBSET TAG > 2
GRUBBS MINIMUM TEST Y1
GRUBBS MAXIMUM TEST Y1
Note:
Masking and swamping are two issues that can affect outlier tests.

Masking can occur when we specify too few outliers in the test. For example, if we are testing for a single outlier when there are in fact two (or more) outliers, these additional outliers may influence the value of the test statistic enough so that no points are declared as outliers.

On the other hand, swamping can occur when we specify too many outliers in the test. For example, if we are testing for two outliers when there is in fact only a single outlier, both points may be declared outliers.

The possibility of masking and swamping are an important reason why it is useful to complement formal outlier tests with graphical methods. Graphics can often help identify cases where masking or swamping may be an issue.

Also, masking is one reason that trying to apply a single outlier test sequentially can fail. If there are multiple outliers, masking may cause the outlier test for the first outlier to return a conclusion of no outliers (and so the testing for any additional outliers is not done).

The Grubbs test is used to check for a single outlier. If there are in fact multiple outliers, the results of the Grubbs test can be distorted.

If multiple outliers are suspected, then the Tietjen-Moore or the generalized extreme studentized deviate tests may be preferred. The Tietjen-Moore test is a generalization of the Grubbs test for the case where multiple outliers may be present. The Tietjen-Moore test requires that the number of suspected outliers be specified exactly while the generalized extreme studentized deviate test only requires that an upper bound on the suspected number of outliers be specified.

Note:
Tests for outliers are dependent on knowing the distribution of the data. The Grubbs test assumes that the data come from an approximately normal distribution. For this reason, it is strongly recommended that the Grubbs test be complemented with a normal probability test. If the data are not approximately normally distributed, then the Grubbs test may be detecting the non-normality of the data rather than the presence of an outlier.
Note:
You can specify the number of digits in the Grubbs output with the command

SET WRITE DECIMALS <value>
Note:
The GRUBBS TEST command automatically saves the following parameters:

 STATVAL = the value of the test statistic CUTOFF0 = the 0 percent point of the reference distribution CUTOFF50 = the 50 percent point of the reference distribution CUTOFF75 = the 75 percent point of the reference distribution CUTOFF90 = the 90 percent point of the reference distribution CUTOFF95 = the 95 percent point of the reference distribution CUTOFF975 = the 97.5 percent point of the reference distribution CUTOFF99 = the 99 percent point of the reference distribution

If the MULTIPLE or REPLICATED option is used, these values will be written to the file "dpst1f.dat" instead.

Note:
In addition to the GRUBBS TEST command, the following commands can also be used:

LET A = GRUBBS CDF Y
LET A = GRUBBS DIRECTION Y
LET A = GRUBBS INDEX Y
LET A = GRUBBS Y

The GRUBBS INDEX returns the row index of the most extreme point and GRUBBS DIRECTION specifies whether the most extreme point is in the minimum direction (a -1 is returned) or the maximum direction (a +1 is returned).

In addition to the above LET command, built-in statistics are supported for about 20+ different commands (enter HELP STATISTICS for details).

Default:
None
Synonyms:
MULTIPLE GRUBBS TEST is a synonym for GRUBBS MULTIPLE TEST
REPLICATED GRUBBS TEST is a synonym for GRUBBS REPLICATED TEST
Related Commands:
 TIETJEN-MOORE TEST = Perform a Tietjen-Moore outlier test. EXTREME STUDENTIZED DEVIATE TEST = Perform a extreme studentized deviate outlier test. DIXON TEST = Perform a Dixon outlier test. GOODNESS OF FIT TEST = Perform a goodness of fit test (Anderson-Darling, Kolmogorov-Smirnov, chi-square, PPCC) WILKS SHAPIRO NORMALITY TEST = Perform a Wilks Shapiro normality test. HISTOGRAM = Generate a histogram. PROBABILITY PLOT = Generates a probability plot. BOX PLOT = Generate a box plot.
Reference:
Grubbs, F. E., "Procedures for Detecting Outlying Observations in Samples", Technometrics, Vol. 11, No. 4, February, 1969, pp. 1-21.

Stefansky, W., "Rejecting Outliers in Factorial Designs", Technometrics, Vol. 14, 1972, pp. 469-479.

Applications:
Outlier Detection
Implementation Date:
 1998/5 2005/5: Corrected the significance levels for the two-sided case (previous version was actually using the significance level for the one=sided case) 2005/5: Added support for the one-sided tests 2006/3: Replaced 2005/5 update with Syntax 2 and Syntax 3 2009/10: Significantly modified the output format 2009/10: Added support for Syntax 4, Syntax 5, and Syntax 6
Program:

SKIP 25
SET WRITE DECIMALS 4
GRUBBS TEST Y

The following output is generated:
            Grubb Test for Outliers: Test for Minimum and Maximum
(Assumption: Normality)

Response Variable: Y

H0: There are no outliers
Ha: The extreme point is an outlier

Summary Statistics:
Number of Observations:                              38
Sample Minimum:                                147.0000
ID for Sample Minimum:                                1
Sample Maximum:                                231.0000
ID for Sample Maximum:                               38
Sample Mean:                                   185.7894
Sample SD:                                      18.5954

Grubbs Test Statistic Value:                     2.4312

Percent Points of the Reference Distribution
-----------------------------------
Percent Point               Value
-----------------------------------
0.0    =          0.000
50.0    =          2.392
75.0    =          2.601
90.0    =          2.846
95.0    =          3.013
97.5    =          3.169
99.0    =          3.355
100.0    =          6.001

Conclusions (Upper 1-Tailed Test)
----------------------------------------------
Alpha    CDF   Critical Value     Conclusion
----------------------------------------------
10%    90%            2.846      Accept H0
5%    95%            3.013      Accept H0
2.5%  97.5%            3.169      Accept H0
1%    99%            3.355      Accept H0


NIST is an agency of the U.S. Commerce Department.

Date created: 6/5/2001
Last updated: 10/30/2015