Many statistics have one of these properties. However, it can be difficult to find statistics that are both resistant and have robustness of efficiency.
The most common estimate of scale, the standard deviation, is the most efficient estimate of scale if the data come from a normal distribution. However, the standard deviation is not robust in the sense that changing even one value can dramatically change the computed value of the standard deviation (i.e., poor resistance). In addition, it does not have robustness of efficiency for non-normal data.
The median absolute deviation (MAD) and interquartile range are the two most commonly used robust alternatives to the standard deviation. The MAD in particular is a very robust scale estimator. However, the MAD has the following limitations:
Rousseeuw and Croux proposed the Qn estimate of scale as an alternative to the MAD. It shares desirable robustness properties with MAD (50% breakdown point, bounded influence function). In addition, it has significantly better normal efficiency (82%) and it does not depend on symmetry.
The Qn scale estimate is motivated by the Hodges-Lehmann estimate of location:
An analogous scale estimate can be obtained by replacing pairwise averages with pairwised distances:
This estimate has high efficiency for normal data (86%), but a breakdown point of only 29%. Rousseeuw and Croux proposed the following variation of this statistic:
where d is a constant factor and k = which is approximately /4 . The value of h is [n/2]+1 (i.e., roughly half the number of obserations). In words, we take kth order statistic of the interpoint distances. The value of d is choosen to make Qn a consistent estimator of scale. We use the value 2.2219 since this is the value that makes Qn a consistent estimator for normal data.
The Rousseeuw and Croux article (see the Reference section below) discusses the properties of the Qn estimate in detail. Syntax:
where <y> is the response variable;
<par> is a parameter where the computed Qn estimate is stored;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.
LET A = QN SCALE Y1 SUBSET TAG > 2
CROSS TABULATE QN SCALE PLOT Y X1 X2
BOOTSTRAP QN SCALE PLOT Y
JACKNIFE QN SCALE PLOT Y
DEX QN SCALE PLOT Y X1 ... XK
QN SCALE BLOCK PLOT Y X1 ... XK
QN SCALE INFLUENCE CURVE Y
QN SCALE INTERACTION PLOT Y X1 X2
TABULATE QN SCALE Y X
"Data Analysis and Regression: A Second Course in Statistics", Mosteller and Tukey, Addison-Wesley, 1977, pp. 203-209.
MULTIPLOT 2 2 MULTIPLOT CORNER COORDINATES 0 0 100 100 MULTIPLOT SCALE FACTOR 2 X1LABEL DISPLACEMENT 12 . LET Y1 = NORMAL RANDOM NUMBERS FOR I = 1 1 200 LET SIGMA = 1 LET Y2 = LOGNORMAL RANDOM NUMBERS FOR I = 1 1 200 . BOOTSTRAP SAMPLES 500 BOOTSTRAP QN SCALE PLOT Y1 X1LABEL B025 = ^B025, B975=^B975 HISTOGRAM YPLOT X1LABEL . BOOTSTRAP QN SCALE PLOT Y2 X1LABEL B025 = ^B025, B975=^B975 HISTOGRAM YPLOT . END OF MULTIPLOT JUSTIFICATION CENTER MOVE 50 96 TEXT QN SCALE BOOTSTRAP: NORMAL MOVE 50 46 TEXT QN SCALE BOOTSTRAP: LOGNORMAL
Date created: 5/5/2003