GRUBBS TEST

Name:

GRUBBS TEST Type:

Analysis Command Purpose:

Perform a Grubbs test for outliers. Description:

Grubbs test detects one outlier at a time. For multiple outliers, delete the single outlier detected and run the Grubbs test. Repeat this process until no outliers are detected.

More formally, the Grubbs test can be defined as follows.

H₀: There are no outliers in the data.

Ha: There is at least one outlier in the data.

Test Statistic: \( G = \frac{\max(|X_i| - \bar{x})} {s} \)
where \( \bar{X} \) and s are the sample mean and standard deviation of the data. That is, the Grubbs test statistic is the largest absolute deviation from the sample mean in units of the sample standard deviation.

Significance
Level: \( \alpha \)

Critical
Region The hypothesis of no outliers is rejected if
\( G > \frac{N - 1} {\sqrt{N}} \sqrt{\frac{t^2_{(\alpha/(2N),N-2)}} {N - 2 + t^2_{(\alpha/(2N),N-2)}}} \)
where t is the percent point function of the t distribution.

Note that the above is actually a combination of the following two tests:

the test that the minimum value is an outlier.
the test that the maximum value is an outlier.

To generate these one-sided tests, the test statistic is

\( G = \frac{\bar{Y} - Y_{min}} {s} \)

\( G = \frac{Y_{max} - \bar{Y}} {s} \)

The significance level in the TPPF function needs to be doubled for the one-sided tests.

You can request that one of the one-sided tests be performed (see the Syntax section).

Generally, graphical methods such as the box plot or histogram are used to detect outliers. However, the Grubbs test can be used if you prefer a more formal test.

Syntax 1:

This syntax performs the two-sided test.

Syntax 2:

This syntax performs the one-sided test for the minimum value.

Syntax 3:

This syntax performs the one-sided test for the maximum value.

Syntax 4:

This syntax can also be used with the MINIMUM and MAXIMUM version of the tests. The <labid> variable is used to identify the lab-id of the minimum and maximum points. However, it is not used in the computation of the statistic.

Syntax 5:

This syntax can also be used with the MINIMUM and MAXIMUM version of the tests. This syntax performs a Grubb test on <y1> then on <y2> and so on. Up to 30 response variables can be specified.

Note that the syntax

GRUBB MULTIPLE TEST Y1 TO Y4

is supported. This is equivalent to

GRUBB MULTIPLE TEST Y1 Y2 Y3 Y4

Syntax 6:

This syntax can also be used with the MINIMUM and MAXIMUM version of the tests. This syntax peforms a cross-tabulation of <x1> ... <xk> and performs a Grubbs test for each unique combination of cross-tabulated values. For example, if X1 has 3 levels and X2 has 2 levels, there will be a total of 6 Grubbs tests performed.

Up to six group-id variables can be specified.

Note that the syntax

GRUBB REPLICATED TEST Y X1 TO X4

is supported. This is equivalent to

GRUBB REPLICATED TEST Y X1 X2 X3 X4

Examples:

Note:

Masking can occur when we specify too few outliers in the test. For example, if we are testing for a single outlier when there are in fact two (or more) outliers, these additional outliers may influence the value of the test statistic enough so that no points are declared as outliers.

On the other hand, swamping can occur when we specify too many outliers in the test. For example, if we are testing for two outliers when there is in fact only a single outlier, both points may be declared outliers.

The possibility of masking and swamping are an important reason why it is useful to complement formal outlier tests with graphical methods. Graphics can often help identify cases where masking or swamping may be an issue.

Also, masking is one reason that trying to apply a single outlier test sequentially can fail. If there are multiple outliers, masking may cause the outlier test for the first outlier to return a conclusion of no outliers (and so the testing for any additional outliers is not done).

The Grubbs test is used to check for a single outlier. If there are in fact multiple outliers, the results of the Grubbs test can be distorted.

If multiple outliers are suspected, then the Tietjen-Moore or the generalized extreme studentized deviate tests may be preferred. The Tietjen-Moore test is a generalization of the Grubbs test for the case where multiple outliers may be present. The Tietjen-Moore test requires that the number of suspected outliers be specified exactly while the generalized extreme studentized deviate test only requires that an upper bound on the suspected number of outliers be specified.

Note:

Tests for outliers are dependent on knowing the distribution of the data. The Grubbs test assumes that the data come from an approximately normal distribution. For this reason, it is strongly recommended that the Grubbs test be complemented with a normal probability test. If the data are not approximately normally distributed, then the Grubbs test may be detecting the non-normality of the data rather than the presence of an outlier. Note:

SET WRITE DECIMALS <value>

Note:

STATVAL	=	the value of the test statistic
CUTOFF0	=	the 0 percent point of the reference distribution
CUTOFF50	=	the 50 percent point of the reference distribution
CUTOFF75	=	the 75 percent point of the reference distribution
CUTOFF90	=	the 90 percent point of the reference distribution
CUTOFF95	=	the 95 percent point of the reference distribution
CUTOFF975	=	the 97.5 percent point of the reference distribution
CUTOFF99	=	the 99 percent point of the reference distribution

If the MULTIPLE or REPLICATED option is used, these values will be written to the file "dpst1f.dat" instead.

Note:

The GRUBBS INDEX returns the row index of the most extreme point and GRUBBS DIRECTION specifies whether the most extreme point is in the minimum direction (a -1 is returned) or the maximum direction (a +1 is returned).

In addition to the above LET command, built-in statistics are supported for about 20+ different commands (enter HELP STATISTICS for details).

Note:

Alternatively, the population standard deviation may be considered to be known accurately (usually based on extensive historical data).

In either of these cases, the critical values for the Grubbs test are modified.

To support these options, enter the commands

SET GRUBB STANDARD DEVIATION <value>
SET GRUBB DEGREES OF FREEDOM <value>

If the specified standard deviation is positive, Dataplot uses the formulas based on the independent estimate of the standard deviation. If the degrees of freedom are not specified, a value of 10,000 will be used. Essentially, any value greater than 120 is effectively treated as a "known" population standard deviation.

To compute the critical values using simulation, enter the command

SET GRUBB TEST CRITICAL VALUES SIMULATION

To reset the default of basing the critical values on a formula, enter

SET GRUBB TEST CRITICAL VALUES FORMULA

The formula from the E178 standard is

\[ T_{n}(\alpha) = t_{\alpha/n,\nu} \sqrt{1 - (1/n)} \]

where t is the percent point function of the t distribution and \( \nu \) is the degrees of freedom. For the "known" standard deviation case, the t distribution is replaced with a normal distribution.

Default:

None Synonyms:

Related Commands:

TIETJEN-MOORE TEST	=	Perform a Tietjen-Moore outlier test.
EXTREME STUDENTIZED DEVIATE TEST	=	Perform a extreme studentized deviate outlier test.
DIXON TEST	=	Perform a Dixon outlier test.
DAVID TEST	=	Perform the David, Hartley and Pearson outlier test.
SKEWNESS OUTLIER TEST	=	Perform the skewness outlier test.
KURTOSIS OUTLIER TEST	=	Perform the kurtosis outlier test.
GOODNESS OF FIT TEST	=	Perform a goodness of fit test (Anderson-Darling, Kolmogorov-Smirnov, chi-square, PPCC)
WILKS SHAPIRO NORMALITY TEST	=	Perform a Wilks Shapiro normality test.
HISTOGRAM	=	Generate a histogram.
PROBABILITY PLOT	=	Generates a probability plot.
BOX PLOT	=	Generate a box plot.

Reference:

Technometrics

Stefansky, W., "Rejecting Outliers in Factorial Designs", Technometrics, Vol. 14, 1972, pp. 469-479.

E178 - 16A (2016), "Standard Practice for Dealing with Outlying Observations", ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959, USA.

Applications:

Outlier Detection Implementation Date:

1998/05
2005/5:	Corrected the significance levels for the two-sided case (previous version was actually using the significance level for the one-sided case)
2005/5:	Added support for the one-sided tests
2006/3:	Replaced 2005/5 update with Syntax 2 and Syntax 3
2009/10:	Significantly modified the output format
2009/10:	Added support for Syntax 4, Syntax 5, and Syntax 6
2019/10:	Added support for an independent estimate of the standard deviation

Program:

 
SKIP 25
READ VANGEL31.DAT Y
SET WRITE DECIMALS 4
GRUBBS TEST Y

            Grubb Test for Outliers: Test for Minimum and Maximum
                           (Assumption: Normality)
 
Response Variable: Y
 
H0: There are no outliers
Ha: The extreme point is an outlier
 
Summary Statistics:
Number of Observations:                              38
Sample Minimum:                                147.0000
ID for Sample Minimum:                                1
Sample Maximum:                                231.0000
ID for Sample Maximum:                               38
Sample Mean:                                   185.7894
Sample SD:                                      18.5954
 
Grubbs Test Statistic Value:                     2.4312
 
 
Percent Points of the Reference Distribution
-----------------------------------
  Percent Point               Value
-----------------------------------
            0.0    =          0.000
           50.0    =          2.392
           75.0    =          2.601
           90.0    =          2.846
           95.0    =          3.013
           97.5    =          3.169
           99.0    =          3.355
          100.0    =          6.001
 
Conclusions (Upper 1-Tailed Test)
----------------------------------------------
  Alpha    CDF   Critical Value     Conclusion
----------------------------------------------
    10%    90%            2.846      Accept H0
     5%    95%            3.013      Accept H0
   2.5%  97.5%            3.169      Accept H0
     1%    99%            3.355      Accept H0