Dataplot Vol 2 Vol 1

# CONDITION INDICES

Name:
CONDITION INDICES (LET)
Type:
Let Subcommand
Purpose:
Compute condition indices of a regression design matrix.
Description:
Condition indices are a measure of the multi-colinearity in a regression design matrix (i.e., the independent variables).

Multi-colinearity results when the columns of X have significant interdependence (i.e., one or more columns of X is close to a linear combination of the other columns). Multi-colinearity can result in numerically unstable estimates of the regression coefficients (small changes in X can result in large changes to the estimated regression coefficients).

Pairwise colinearity can be determined from viewing a correlation matrix of the independent variables. However, correlation matrices will not reveal higher order colinearity.

There are a number of approaches to dealing with multi-colinearity. Some of these include:

1. Delete one or more of the independent variables from the fit.
2. Perform a principal components regression.
3. Compute the regression using a singular value decomposition approach. Note that Dataplot uses a modified Gram-Schmidt method (Dataplot can perform a singular value decomposition, however this has not been incorporated into the fit).

Condition indices are one measure that can be used to detect multi-colinearity (variance inflation factors are another). The condition indices are calculated as follows:

1. Scale the columns of the X matrix to have unit sums of squares.
2. Calculate the singular values of the scaled X matrix and square them.

Condition indices between 30 and 100 indicate moderate to strong colinearity.

Syntax:
LET <y1> = CONDITION INDICES <mat1>               <SUBSET/EXCEPT/FOR qualification>
where <mat1> is the design matrix for which the condition indices are to be computed;
<y1> is a vector where the resulting condition indices are saved;
and where the <SUBSET/EXCEPT/FOR qualification> is optional (and rarely used in this context).
Examples:
LET Y = CONDITION INDICES X
Note:
Matrices are created with either the READ MATRIX, CREATE MATRIX, or MATRIX DEFINITION command. Enter HELP MATRIX DEFINITION, HELP CREATE MATRIX, and HELP READ MATRIX for details.
Note:
The columns of a matrix are accessible as variables by appending an index to the matrix name. For example, the 4x4 matrix C has columns C1, C2, C3, and C4. These columns can be operated on like any other DATAPLOT variable.
Note:
The maximum size matrix that DATAPLOT can handle is set when DATAPLOT is built on a particular site. Enter the command HELP MATRIX DIMENSION for details on the maximum size matrix that can be accomodated.
Default:
None
Synonyms:
None
Related Commands:
 VARIANCE INFLATION FACTORS = Compute variance inflation factors. CREATE MATRIX = Create a matrix from a list of variables. FIT = Perform a least squares fit. CATCHER MATRIX = Compute the catcher matrix. PARTIAL REGRESSION PLOT = Compute the catcher matrix.
Reference:
"Efficient Computing of Regression Diagnostics", Velleman and Welsch, American Statistician, November, 1981, Vol. 35, No. 4, pp. 234-242.
Applications:
Regression Diagnostics
Implementation Date:
2002/6
Program:
```DIMENSION 100 COLUMNS
SKIP 25
READ HALD647.DAT Y X1 X2 X3 X4
SKIP 0
LET N = SIZE X1
LET X0 = SEQUENCE 1 1 N
LET Z = CREATE MATRIX X0 X1 X2 X3 X4
LET C = CONDITION INDICES Z
SET WRITE DECIMALS 2
PRINT C
```
The following ouput is generated.
```
VARIABLES--C

1.00
7.11
10.19
55.34
149.90
```

Date created: 8/6/2002
Last updated: 4/4/2003
Please email comments on this WWW page to alan.heckert@nist.gov.