SED navigation bar go to SED home page go to Dataplot home page go to NIST home page SED Home Page SED Staff SED Projects SED Products and Publications Search SED Pages
Dataplot Vol 2 Vol 1

VARIANCE INFLATION FACTORS

Name:
    VARIANCE INFLATION FACTORS (LET)
Type:
    Let Subcommand
Purpose:
    Compute variance inflation factors for a regression design matrix.
Description:
    Variance inflation factors are a measure of the multi-colinearity in a regression design matrix (i.e., the independent variables).

    Multi-colinearity results when the columns of X have significant interdependence (i.e., one or more columns of X is close to a linear combination of the other columns). Multi-colinearity can result in numerically unstable estimates of the regression coefficients (small changes in X can result in large changes to the estimated regression coefficients).

    Pairwise colinearity can be determined from viewing a correlation matrix of the independent variables. However, correlation matrices will not reveal higher order colinearity.

    There are a number of approaches to dealing with multi-colinearity. Some of these include:

    1. Delete one or more of the independent variables from the fit.
    2. Perform a principal components regression.
    3. Compute the regression using a singular value decomposition approach. Note that Dataplot uses a modified Gram-Schmidt method (Dataplot can perform a singular value decomposition, however this has not been incorporated into the fit).

    Variance inflation factors are one measure that can be used to detect multi-colinearity (condition indices are another).

    Variance inflation factors are a scaled version of the multiple correlation coefficient between variable j and the rest of the independent variables. Specifically,

      VIF(j) = 1/(1 - R(j)**2)

    where Rj is the multiple correlation coefficient.

    Variance inflation factors are often given as the reciprocal of the above formula. In this case, they are referred to as the tolerances.

    If Rj equals zero (i.e., no correlation between Xj and the remaining independent variables), then VIFj equals 1. This is the minimum value. Neter, Wasserman, and Kutner (see Reference below) recommend looking at the largest VIF value. A value greater than 10 is an indiciation of potential multi-colinearity problems.

Syntax:
    LET <y1> = VARIANCE INFLATION FACTORS <mat1>
                                <SUBSET/EXCEPT/FOR qualification>
    where <mat1> is the design matrix for which the variance inflation factors are to be computed;
                  <y1> is a vector where the resulting variance inflation factors are saved;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional (and rarely used in this context).
Examples:
    LET Y = VARIANCE INFLATION FACTORS X
Note:
    Matrices are created with either the READ MATRIX, CREATE MATRIX, or MATRIX DEFINITION command. Enter HELP MATRIX DEFINITION, HELP CREATE MATRIX, and HELP READ MATRIX for details.
Note:
    The columns of a matrix are accessible as variables by appending an index to the matrix name. For example, the 4x4 matrix C has columns C1, C2, C3, and C4. These columns can be operated on like any other DATAPLOT variable.
Note:
    The maximum size matrix that DATAPLOT can handle is set when DATAPLOT is built on a particular site. Enter the command HELP MATRIX DIMENSION for details on the maximum size matrix that can be accomodated.
Default:
    None
Synonyms:
    None
Related Commands:
    CONDITION INDICES = Compute condition indices of a regresion design matrix.
    CREATE MATRIX = Create a matrix from a list of variables.
    FIT = Perform a least squares fit.
    CATCHER MATRIX = Compute the catcher matrix.
    PARTIAL REGRESSION PLOT = Compute the catcher matrix.
Reference:
    "Applied Linear Statistical Models", 3rd ed., Neter, Wasserman, and Kunter, 1990, Irwin.

    "Efficient Computing of Regression Diagnostics", Velleman and Welsch, American Statistician, November, 1981, Vol. 35, No. 4, pp. 234-242.

Applications:
    Regression Diagnostics
Implementation Date:
    2002/6
Program:
    DIMENSION 100 COLUMNS 
    SKIP 25 
    READ HALD647.DAT Y X1 X2 X3 X4 
    SKIP 0 
    LET N = SIZE X1 
    LET X0 = SEQUENCE 1 1 N 
    LET Z = CREATE MATRIX X0 X1 X2 X3 X4 
    LET V = VARIANCE INFLATION FACTORS Z 
    SET WRITE DECIMALS 2 
    PRINT V 
        
    The following output is generated.
     VARIABLES--V
    
               2.41
               3.45
               1.16
               3.59
               1.39
        

Date created: 8/6/2002
Last updated: 4/4/2003
Please email comments on this WWW page to alan.heckert@nist.gov.