 Dataplot Vol 2 Vol 1

# PERCENTAGE BEND CORRELATION

Name:
PERCENTAGE BEND CORRELATION (LET)
Type:
Let Subcommand
Purpose:
Compute the percentage bend correlation for a variable.
Description:
Mosteller and Tukey (see Reference section below) define two types of robustness:

1. resistance means that changing a small part, even by a large amount, of the data does not cause a large change in the estimate

2. robustness of efficiency means that the statistic has high efficiency in a variety of situations rather than in any one situation. Efficiency means that the estimate is close to optimal estimate given that we know what distribution that the data comes from. A useful measure of efficiency is:

Efficiency = (lowest variance feasible)/ (actual variance)

Many statistics have one of these properties. However, it can be difficult to find statistics that are both resistant and have robustness of efficiency.

The Pearson correlation coefficient is an optimal estimator for Gaussian data. However, it is not resistant and it does not have robustness of efficiency.

The percentage bend correlation estimator, discussed in Shoemaker and Hettmansperger and also by Wilcox, is both resistant and robust of efficiency. The rationale and derivation for this estimate is given in these references.

The percentage bend correlation between two variables X and Y is computed as follows:

1. Set m = (1-$$\beta$$)*n) + 0.5. Round m down to the nearest integer.

2. Let $$W_{i} = |X_{i} - M_{x}|$$ for i = 1, ..., n where Mx. is the median of X.

3. Sort the Wi in ascending order.

4. $$\hat{W}_{x}$$ = W(m) (i. e., the m-th order statistic). W(m) is the estimate of the (1-$$\beta$$) quantile of W.

5. Sort the X values. Compute the number of values of $$(X_{i} - M_{x})/\hat{W}_{x}(\beta)$$ that are less than -1 and the number that are greater than +1 and store in i1 and i2 respectively. Then compute

$$S_{x} = \sum_{i=i1+1}^{n-i2}{X_{i}}$$
$$\hat{\phi}_{x} = \frac{\hat{W}_{x}(i2 - i1) + S_{x}}{n - i1 - i2}$$
$$U_{i} = \frac{X_{i} - \hat{\phi}_{x}}{\hat{W}_{x}}$$

6. Repeat the above calculations on the Y variable. Store corresponding quantities in $$\hat{W}_{y}$$, $$\hat{\phi}_{y}$$, and Vi.

7. Define the function

$$\Psi(x) = \max[-1, \min(1,x)]$$

8. Compute

Ai = $$\Psi_{i}$$ (Ui)
Bi = $$\Psi_{i}$$ (Vi)

9. Compute the percentage bend correlation

$$\rho_{pb} = \frac{\sum_{i=1}^{n}{A_{i}B_{i}}} {\sqrt{\sum_{i=1}^{n}{A_{i}^2}\sum_{i=1}^{n}{B_{i}^2}}}$$

The value of $$\beta$$ is selected between 0 and 0.5. Higher values of $$\beta$$ result in a higher breakdown point at the expense of lower efficiency.

Syntax:
LET <par> = PERCENTAGE BEND CORRELATION <y1> <y2>
<SUBSET/EXCEPT/FOR qualification>
where <y1> is the first response variable;
<y2> is the second response variable;
<par> is a parameter where the computed percentage bend correlation is stored;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.
Examples:
LET A = PERCENTAGE BEND CORRELATION Y1 Y2
LET A = PERCENTAGE BEND CORRELATION Y1 Y2 SUBSET TAG > 2
Note:
To set the value of $$\beta$$, enter the command

LET BETA = <value>

where <value> is greater than 0 and less than or equal to 0.5. The default value for $$\beta$$ is 0.1.

Note:
Dataplot statistics can be used in a number of commands. For details, enter

Default:
None
Synonyms:
None
Related Commands:
 PERCENTAGE BEND MIDVARIANCE = Compute the percentage bend midvariance of a variable. BIWEIGHT MIDCORRELATION = Compute a biweight correlation estimate of a variable. WINSORIZED CORRELATION = Compute a Winsorized correlation estimate of a variable. CORRELATION = Compute the correlation between two variables. RANK CORRELATION = Compute the rank correlation between two variables. VARIANCE = Compute the variance of a variable. STATISTIC PLOT = Generate a statistic versus group plot for a given statistic.
References:
Shoemaker and Hettmansperger (1982), "Robust Estimates of and Tests for the One- and Two-Sample Scale Models", Biometrika 69, pp. 47-54.

Rand Wilcox (1997), "Introduction to Robust Estimation and Hypothesis Testing", Academic Press.

Mosteller and Tukey (1977), "Data Analysis and Regression: A Second Course in Statistics", Addison-Wesley, pp. 203-209.

Applications:
Robust Data Analysis
Implementation Date:
2002/07
Program 1:
SKIP 25
READ MATRIX IRIS.DAT Y1 Y2 Y3 Y4 X
LET M = CREATE MATRIX Y1 Y2 Y3 Y4
SET CORRELATION TYPE PERCENTAGE BEND
LET B = CORRELATION MATRIX Y1 Y2 Y3 Y4

Program 2:

SKIP 25
READ IRIS.DAT Y1 Y2 Y3 Y4 X
.
MULTIPLOT CORNER COORDINATES 0 0 100 95
MULTIPLOT SCALE FACTOR 2
MULTIPLOT 2 1
BOOTSTRAP SAMPLES 500
BOOTSTRAP PERCENTAGE BEND CORRELATION PLOT Y1 Y2
X1LABEL DISPLACEMENT 12
X1LABEL B025 = ^B025, B975=^B975
HISTOGRAM YPLOT
END OF MULTIPLOT
MOVE 50 96
JUSTIFICATION CENTER
TEXT PERCENTAGE BEND CORRELATION BOOTSTRAP: IRIS DATA NIST is an agency of the U.S. Commerce Department.

Date created: 08/12/2002
Last updated: 10/07/2016