Dataplot Vol 2 Vol 1

# SN SCALE

Name:
SN SCALE (LET)
Type:
Let Subcommand
Purpose:
Compute the Sn scale estimate for a variable.
Description:
Mosteller and Tukey (see Reference section below) define two types of robustness:

1. resistance means that changing a small part, even by a large amount, of the data does not cause a large change in the estimate

2. robustness of efficiency means that the statistic has high efficiency in a variety of situations rather than in any one situation. Efficiency means that the estimate is close to optimal estimate given that we know what distribution that the data comes from. A useful measure of efficiency is:

Efficiency = (lowest variance feasible)/ (actual variance)

Many statistics have one of these properties. However, it can be difficult to find statistics that are both resistant and have robustness of efficiency.

The most common estimate of scale, the standard deviation, is the most efficient estimate of scale if the data come from a normal distribution. However, the standard deviation is not robust in the sense that changing even one value can dramatically change the computed value of the standard deviation (i.e., poor resistance). In addition, it does not have robustness of efficiency for non-normal data.

The median absolute deviation (MAD) and interquartile range are the two most commonly used robust alternatives to the standard deviation. The MAD in particular is a very robust scale estimator. However, the MAD has the following limitations:

1. It does not have particularly high efficiency for data that is in fact normal (37%). In comparison, the median has 64% efficiency for normal data.

2. The MAD statistic also has an implicit assumption of symmetry. That is, it measures the distance from a measure of central location (the median).

Rousseeuw and Croux proposed the Sn estimate of scale as an alternative to the MAD. It shares desirable robustness properties with MAD (50% breakdown point, bounded influence function). In addition, it has significantly better normal efficiency (58%) and it does not depend on symmetry.

The Sn scale estimate is defined as:

$$S_{n} = c Median_{i} \{Median_{j} |x_{i} - x_{j}|\}$$

That is, for each i we compute the median of {|xi - xj j = 1, ..., n}. The median of these n numbers is then the estimate of Sn. The constant c is determined to make Sn a consistent estimator. The value used is 1.1926 (this is the value needed to make Sn a consistent estimator for normal data).

The Sn statistic measures typical distances between values in contrast to the MAD and the standard deviation which measure the distance from a central location. This is why the Sn is appropriate for asymmetic distributions as well symmetric distributions.

The Rousseeuw and Croux article (see the Reference section below) discusses the properties of the Sn estimate in detail.

Syntax:
LET <par> = SN SCALE <y>             <SUBSET/EXCEPT/FOR qualification>
where <y> is the response variable;
<par> is a parameter where the computed Sn scale statistic is stored;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.
Examples:
LET A = SN SCALE Y1
LET A = SN SCALE Y1 SUBSET TAG > 2
Note:
Dataplot uses code provided by Rousseeuw and Croux to compute the Sn estimate. This algorithm uses an efficient computational method for computing Sn.
Note:
The Rousseeuw and Croux article also proposes the Qn scale estimate. The article discusses the properties of both estimators in detail.
Note:
Dataplot statistics can be used in 20+ commands. For details, enter

Default:
None
Synonyms:
None
Related Commands:
 QN SCALE = Compute the Qn scale estimate of a variable. MEDIAN ABSOLUTE DEVIATION = Compute the median absolute deviation of a variable. INTERQUARTILE RANGE = Compute the interquartile range of a variable. STANDARD DEVIATION = Compute the standard deviation of a variable. DIFFERENCE OF SN = Compute the difference of the Sn scale estimates between two variables. STATISTIC PLOT = Generate a statistic versus subset plot. BOOTSTRAP PLOT = Generate a bootstrap plot for a statistic.
Reference:
Peter J. Rousseuw and Christophe Croux (1993), "Alternatives to the Median Absolute Deviation", Journal of the American Statistical Association, Vol. 88, No. 424, pp. 1273-1283.

Mosteller and Tukey (1977), "Data Analysis and Regression: A Second Course in Statistics", , Addison-Wesley, pp. 203-209.

Applications:
Data Analysis
Implementation Date:
2003/04
Program:
MULTIPLOT 2 2
MULTIPLOT CORNER COORDINATES 0 0 100 100
MULTIPLOT SCALE FACTOR 2
X1LABEL DISPLACEMENT 12
.
LET Y1 = NORMAL RANDOM NUMBERS FOR I = 1 1 200
LET SIGMA = 1
LET Y2 = LOGNORMAL RANDOM NUMBERS FOR I = 1 1 200
.
BOOTSTRAP SAMPLES 500
BOOTSTRAP SN SCALE PLOT Y1
X1LABEL B025 = ^B025, B975=^B975
HISTOGRAM YPLOT
X1LABEL
.
BOOTSTRAP SN SCALE PLOT Y2
X1LABEL B025 = ^B025, B975=^B975
HISTOGRAM YPLOT
.
END OF MULTIPLOT
JUSTIFICATION CENTER
MOVE 50 96
TEXT SN SCALE BOOTSTRAP: NORMAL
MOVE 50 46
TEXT SN SCALE BOOTSTRAP: LOGNORMAL


NIST is an agency of the U.S. Commerce Department.

Date created: 05/05/2003
Last updated: 10/07/2016