Compute condition indices of a regression design matrix.
Condition indices are a measure of the multi-colinearity in a
regression design matrix (i.e., the independent variables).
Multi-colinearity results when the columns of X have
significant interdependence (i.e., one or more columns of X
is close to a linear combination of the other columns).
Multi-colinearity can result in numerically unstable estimates
of the regression coefficients (small changes in X can
result in large changes to the estimated regression coefficients).
Pairwise colinearity can be determined from viewing a correlation
matrix of the independent variables. However, correlation
matrices will not reveal higher order colinearity.
There are a number of approaches to dealing with
multi-colinearity. Some of these include:
- Delete one or more of the independent variables from
- Perform a principal components regression.
- Compute the regression using a singular value
decomposition approach. Note that Dataplot uses
a modified Gram-Schmidt method (Dataplot can perform
a singular value decomposition, however this has not
been incorporated into the fit).
Condition indices are one measure that can be used to
detect multi-colinearity (variance inflation factors are
another). The condition indices are calculated as follows:
- Scale the columns of the X matrix to have unit
sums of squares.
- Calculate the singular values of the scaled X
matrix and square them.
Condition indices between 30 and 100 indicate moderate to
LET <y1> = CONDITION INDICES <mat1>
where <mat1> is the design matrix for which the condition
indices are to be computed;
<y1> is a vector where the resulting condition
indices are saved;
and where the <SUBSET/EXCEPT/FOR qualification> is
optional (and rarely used in this context).
LET Y = CONDITION INDICES X
Matrices are created with either the READ MATRIX, CREATE MATRIX,
or MATRIX DEFINITION command. Enter HELP MATRIX DEFINITION,
HELP CREATE MATRIX, and HELP READ MATRIX for details.
The columns of a matrix are accessible as variables by appending
an index to the matrix name. For example, the 4x4 matrix C has
columns C1, C2, C3, and C4. These columns can be operated on
like any other DATAPLOT variable.
The maximum size matrix that DATAPLOT can handle is set when
DATAPLOT is built on a particular site. Enter the command
HELP MATRIX DIMENSION for details on the maximum size matrix
that can be accomodated.
VARIANCE INFLATION FACTORS
= Compute variance inflation factors.
= Create a matrix from a list of variables.
= Perform a least squares fit.
= Compute the catcher matrix.
PARTIAL REGRESSION PLOT
= Compute the catcher matrix.
"Efficient Computing of Regression Diagnostics", Velleman and
Welsch, American Statistician, November, 1981, Vol. 35, No. 4,
DIMENSION 100 COLUMNS
READ HALD647.DAT Y X1 X2 X3 X4
LET N = SIZE X1
LET X0 = SEQUENCE 1 1 N
LET Z = CREATE MATRIX X0 X1 X2 X3 X4
LET C = CONDITION INDICES Z
SET WRITE DECIMALS 2
The following ouput is generated.
Date created: 8/6/2002
Last updated: 4/4/2003
Please email comments on this WWW page to