1.
Exploratory Data Analysis
1.3. EDA Techniques 1.3.5. Quantitative Techniques 1.3.5.17. Detection of Outliers


Purpose: Detection of Outliers 
The TietjenMoore test
(TietjenMoore 1972)
is used to detect multiple outliers in a
univariate data set that follows an
approximately normal
distribution.
The TietjenMoore test is a generalization of the Grubbs' test to the case of multiple outliers. If testing for a single outlier, the TietjenMoore test is equivalent to the Grubbs' test. It is important to note that the TietjenMoore test requires that the suspected number of outliers be specified exactly. If this is not known, it is recommended that the generalized extreme studentized deviate test be used instead (this test only requires an upper bound on the number of suspected outliers). 

Definition 
The TietjenMoore test is defined for the hypothesis:


Sample Output 
The TietjenMoore paper gives the following 15 observations of
vertical semidiameters of
the planet Venus (this example originally appeared
in Grubbs' 1950
paper):
This plot indicates that the normality assumption is reasonable. The minimum value appears to be an outlier. To a lesser extent, the maximum value may also be an outlier. The TietjenMoore test of the two most extreme points (1.40 and 1.01) is shown below. H_{0}: there are no outliers in the data H_{a}: the two most extreme points are outliers Test statistic: E_{k} = 0.292 Significance level: α = 0.05 Critical value for lower tail: 0.317 Critical region: Reject H_{0} if E_{k} < 0.317The TietjenMoore test is a lower, onetailed test, so we reject the null hypothesis that there are no outliers when the value of the test statistic is less than the critical value. For our example, the null hypothesis is rejected at the 0.05 level of significance and we conclude that the two most extreme points are outliers. 

Questions 
The TietjenMoore test can be used to answer the following question:


Importance 
Many statistical techniques are sensitive to the presence
of outliers. For example, simple calculations of the mean
and standard deviation may be distorted by a single grossly
inaccurate data point.
Checking for outliers should be a routine part of any data analysis. Potential outliers should be examined to see if they are possibly erroneous. If the data point is in error, it should be corrected if possible and deleted if it is not possible. If there is no reason to believe that the outlying point is in error, it should not be deleted without careful consideration. However, the use of more robust techniques may be warranted. Robust techniques will often downweight the effect of outlying points without deleting them. 

Related Techniques  Several graphical techniques can, and should, be used to help detect outliers. A simple normal probability plot, run sequence plot, a box plot, or a histogram should show any obviously outlying points. In addition to showing potential outliers, several of these graphics also help assess whether the data follow an approximately normal distribution.  
Software  Some general purpose statistical software programs support the TietjenMoore test. Both Dataplot code and R code can be used to generate the analyses in this section. These scripts use the TIETMOO1.DAT data file. 