|
DIXON TESTName:
The Dixon text is based on comparing the distance of one end observation from its neighbors with the range of all the observations (or all but one or two observations). This is in contrast to the Grubbs (and the generalizations of Grubbs: the Tietjen-Moore and extreme studentized deviate tests) which are based on the number of standard deviations from the mean of the extreme observations. Specifically, given a set of ordered observastions Y1, Y2, ..., YN, the Dixon test is computed as follows:
The critical values are obtained via simulation. The simulation is performed by generating standard normal random sample and computing the Dixon test statistic. The critical values are dynamically generated using 25,000 random samples. The null hypothesis of no outliers is rejected if the test statistic is greater than the critical value. There are a number of variants of the Dixon test (e.g., it can be adopted to handle more than one outlier). Dataplot uses the formulation of the Dixon test as given in the ASTM-E178 standard (this is taken from Dixon's Biometrics paper). Dixon's test is generally limited to the case of small samples. One reason for this is that it is quite sensitive to the number of outliers being tested for and this can be difficult to determine for larger samples. It also assumes that the underlying data distribution (with the exception of the outlier) is normal. For this reason, it is recommended that a Dixon test be preceeded by a normal probability plot. The normal probability can be used to determine if the assumption of normality and the prescence of at most one outlier are in fact reasonable assumptions.
<SUBSET/EXCEPT/FOR qualification> where <MINIMUM/MAXIMUM> is an optional keyword specifies whether the minimum or maximum value is tested as an outlier; <y> is the response variable being tested; and where the <SUBSET/EXCEPT/FOR qualification> is optional. If neither MINIMUM or MAXIMUM is given, both the minimum and maximum points will be tested (the more extreme of the two values will be used).
<SUBSET/EXCEPT/FOR qualification> where <MINIMUM/MAXIMUM> is an optional keyword specifies whether the minimum or maximum value is tested as an outlier; <y> is the response variable being tested; <labid> is an id-variable; and where the <SUBSET/EXCEPT/FOR qualification> is optional. The <labid> variable is only used to identify the point being tested as an outlier. It does not affect the computations.
<SUBSET/EXCEPT/FOR qualification> where <MINIMUM/MAXIMUM> is an optional keyword specifies whether the minimum or maximum value is tested as an outlier; <y1> ... <yk> is a list of 1 to 30 response variables; and where the <SUBSET/EXCEPT/FOR qualification> is optional. This syntax performs a Dixon test on <y1> then on <y2> and so on. Note that the syntax
is supported. This is equivalent to
<SUBSET/EXCEPT/FOR qualification> where <MINIMUM/MAXIMUM> is an optional keyword specifies whether the minimum or maximum value is tested as an outlier; <y> is the response variable; <x1> ... <xk> is a list of 1 to 6 group-id variables; and where the <SUBSET/EXCEPT/FOR qualification> is optional. This syntax peforms a cross-tabulation of <x1> ... <xk> and performs a Dixon test for each unique combination of cross-tabulated values. For example, if X1 has 3 levels and X2 has 2 levels, there will be a total of 6 Dixon tests performed. Note that the syntax
is supported. This is equivalent to
If either the first or last replication variable has all unique elements, this variable will be interpreted as a lab-id variable rather than a replication variable.
DIXON TEST Y1 LABID DIXON MULTIPLE TEST Y1 Y2 Y3 DIXON REPLICATED TEST Y X1 X2 DIXON TEST Y1 SUBSET TAG > 2 DIXON MINIMUM TEST Y1 DIXON MAXIMUM TEST Y1
Masking can occur when we specify too few outliers in the test. For example, if we are testing for a single outlier when there are in fact two (or more) outliers, these additional outliers may influence the value of the test statistic enough so that no points are declared as outliers. On the other hand, swamping can occur when we specify too many outliers in the test. For example, if we are testing for two outliers when there is in fact only a single outlier, both points may be declared outliers. The possibility of masking and swamping are an important reason why it is useful to complement formal outlier tests with graphical methods. Graphics can often help identify cases where masking or swamping may be an issue. Also, masking is one reason that trying to apply a single outlier test sequentially can fail. If there are multiple outliers, masking may cause the outlier test for the first outlier to return a conclusion of no outliers (and so the testing for any additional outliers is not done). The Dixon and Grubbs tests are used to check for a single outlier. If there are in fact multiple outliers, the results of these tests can be distorted. If multiple outliers are suspected, then the Tietjen-Moore or the generalized extreme studentized deviate tests may be preferred. The Tietjen-Moore test is a generalization of the Dixon test for the case where multiple outliers may be present. The Tietjen-Moore test requires that the number of suspected outliers be specified exactly while the generalized extreme studentized deviate test only requires that an upper bound on the suspected number of outliers be specified.
If you perform a formal goodness of fit test for assessing normality, it is recommended you omit the potential outlier from the test (i.e., we want to distinguish between an outlier and non-normality and the potential outlier may distort the normality test).
If the MULTIPLE or REPLICATED option is used, these values will be written to the file "dpst1f.dat" instead.
In addition to the above LET command, built-in statistics are supported for about 17 different commands (enter HELP STATISTICS for details).
REPLICATED DIXON TEST is a synonym for DIXON REPLICATED TEST
Dixon and Massey (1957), "Introduction to Statistical Analysis," Second Edition, McGraw-Hill, pp. 275-278. ASTM E 178 - 08, "Standard Practice for Dealing with Outlying Observations," ASTM International, 100 Barr Harbor Drive, PO BOX C700, West Conshohoceken, PA 19428-2959, USA. Iglewicz and Hoaglin (1993), "Volume 16: How To Detect and Handle Outliers," The ASQC Basic Reference in Quality Control: Statistical Techniques, Edward F. Mykytka, Ph.D., Editor.
. Following uses example 1 from ASTM E 178 - 08 standard. . . Response variable is breaking strength (in pounds) of . 0.104-in hard-drawn copper wire. . let y = data 568 570 570 570 572 578 584 596 . let a = dixon maximum test y . set write decimals 5 dixon maximum test yThe following output is generated. Dixon Test for a Single Outlier: Maximum Case (Assumption: Normality) Response Variable: Y H0: There are no outliers Ha: The maximum point is an outlier Summary Statistics: Number of Observations: 8 Sample Minimum: 568.00000 ID for Sample Minimum: 0 Sample Maximum: 596.00000 ID for Sample Maximum: 0 Sample Mean: 576.00000 Sample SD: 9.68061 Sample Range: 28.00000 Dixon Test Statistic Value: 0.46153 CDF Value: 0.88704 P-Value 0.11295 Percent Points of the Reference Distribution ----------------------------------- Percent Point Value ----------------------------------- 0.0 = 0.000 25.0 = 0.101 50.0 = 0.210 75.0 = 0.349 80.0 = 0.384 90.0 = 0.478 95.0 = 0.552 97.5 = 0.615 99.0 = 0.684 99.5 = 0.724 100.0 = 0.904 Conclusions (Upper 1-Tailed Test) ---------------------------------------------- Alpha CDF Critical Value Conclusion ---------------------------------------------- 10% 90% 0.478 Accept H0 5% 95% 0.552 Accept H0 2.5% 97.5 0.615 Accept H0 1% 99% 0.684 Accept H0 *Critical Values Based on 25000 Monte Carlo Simulations
Last updated: 10/13/2015 |
Last updated: 12/11/2023 Please email comments on this WWW page to alan.heckert@nist.gov. |