 Dataplot Vol 1 Vol 2

# PARTIAL LEVERAGE PLOT

Name:
PARTIAL LEVERAGE PLOT
Type:
Graphics Command
Purpose:
Generate a partial leverage plot.
Description:
In multi-linear regression, high leverage points are those that are outliers with respect to the independent variables. Influential points are those that cause large changes in the parameter estimates when they are deleted. Although an influential point will typically have high leverage, a high leverage point is not necessarily an influential point. The leverage is typically defined as the diagonal of the hat matrix (hat matrix = H = X(X'X)-1X'). Dataplot currently writes a number of measures of influence and leverage to the file DPST3F.DAT (e.g., the diagonal of the hat matrix, Cook's distance, DFFITS).

Partial leverage is used to measure the contribution of the individual independent variables to the leverage of each observation. That is, if hi is the ith row of the diagonal of the hat matrix, how does hi change as we add a variable to the regression model.

The partial leverage is computed as:

$$(PL_{j})_{i} = \frac{(X_{j.\left[ j\right] })_{i}^{2}} {\sum_{k=1}^{n}{(X_{j.\left[ j\right] })_{k}^{2}}}$$

where

j = jth independent variable
i = the ith observation
Xj.[j] = residuals from regressing Xj against the remaining indpependent variables

Note that the partial leverage is the leverage of the ith point in the partial regression plot for the jth variable (enter HELP PARTIAL REGRESSION PLOT for details on the partial regression plot).

The interpretation of the partial leverage plot is that data points with large partial leverage for an independent variable can exert undue influence on the selection of that variable in automatic regression model building procedures (e.g., the BEST CP command in Dataplot).

Dataplot provides two forms for the partial leverage plot. You can generate either a single partial leverage plot or you can generate a matrix of partial leverage plots (one plot for each independent variable in the model).

For the matrix form of the command, a number of SET FACTOR PLOT options can be used to control the appearance of the plot (not all of the SET FACTOR PLOT options apply). These are discussed in the Notes section below.

Syntax 1:
PARTIAL LEVERAGE PLOT <y> <x1> ... <xk> <xi>               <SUBSET/EXCEPT/FOR qualification>
where <y> is the response variable;
<x1> ... <xk> are the independent variables;
<xi> is the independent variable for which the partial leverage plot is being generated
(note that <xi> must be one of the variables listed in <x1> ... <xk>;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

This is the syntax for generating a single partial leverage plot.

Syntax 2:
MATRIX PARTIAL LEVERAGE PLOT <y> <x1> ... <xk>               <SUBSET/EXCEPT/FOR qualification>
where <y> is the response variable;
<x1> ... <xk> are the independent variables;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

This syntax is used to generate a matrix of partial leverage plots.

Examples:
PARTIAL LEVERAGE PLOT Y X1 X2 X3 X4 X2

MATRIX PARTIAL LEVERAGE PLOT Y X1 X2 X3 X4

PARTIAL LEVERAGE PLOT Y X1 X2 X3 X4 X2 SUBSET TAG > 2
MATRIX PARTIAL LEVERAGE PLOT Y X1 X2 X3 X4 SUBSET TAG > 2

Note:
The following option controls which axis tic marks, tic mark labels, and axis labels are plotted.

SET FACTOR PLOT LABELS <ON/OFF/XON/YON/BOX>

OFF means that all axis labels are suppressed (this can be useful if a large number of variables are being plotted). ON means that both X and Y axis labels are printed. XON only plots the x axis labels and YON only plots the y axis labels.

BOX is a special option that creates an extra column on the left and an extra row on the bottom. The axis label is printed in this box. BOX is typically reserved for the plot types that plot the variable names in the axes labels.

The default is ON (both x and y axis labels are printed).

Note:
The following option controls where the x axis tic marks, tic mark labels, and axis label are printed.

SET FACTOR PLOT X AXIS <BOTTOM/TOP/ALTERNATE>

BOTTOM specifies that the x axis labels are printed on the bottom axis (on the last row only). TOP specifies that the x axis labels are printed on the top axis (first row only). ALTERNATE specifies that the x axis labels alternate between the top (first row) and bottom axis (last row). We recommend using the TIC OFFSET command to avoid overlap of axis labels and tic marks.

The default is ALTERNATE.

Note:
The following option controls where the y axis tic marks, tic mark labels, and axis label are printed.

SET FACTOR PLOT Y AXIS <LEFT/RIGHT/ALTERNATE>

LEFT specifies that the y axis labels are printed on the left axis (on the first column only). RIGHT specifies that the y axis labels are printed on the right axis (last column only). ALTERNATE specifies that the y axis labels alternate between the left (first column) and right axis (last column). We recommend using the TIC OFFSET command to avoid overlap of axis labels and tic marks.

The default is ALTERNATE.

Note:
Users have different preferences in terms of whether the plot frames for neighboring plots are connected or not. This is controlled with the following option.

SET FACTOR PLOT FRAME <DEFAULT/CONNECTED/USER>

DEFAULT connects neighboring frames (i.e., the FRAME CORNER COORDINATES are set to 0 0 100 100). USER uses whatever frame coordinates are currently set (15 20 85 90 by default) and makes no special provisions for axis labels and tic marks (i.e., you set them as you normally would, each plot uses whatever you have set). CONNECTED uses whatever frame coordinates have been set by the user, but it draws the axis labels and tic marks as if DEFAULT were being used (that is, as determined by the SET FACTOR PLOT commands described above). Typically, CONNECTED is used to put a small bit of space between plots. For example, you might use FRAME CORNER COORDINATES 3 3 97 97 before the PARTIAL RESIDUAL PLOT command.

Since the plots can often have different limits for the axes, the default is USER.

Note:
When the tic marks and tic mark labels are all plotted on the same side (i.e., SET FACTOR PLOT Y AXIS is set to LEFT or RIGHT or SET PARTIAL RESIDUAL PLOT X AXIS is set to BOTTOM or TOP), then overlap between plots is possible. The TIC OFFSET command can be used to avoid this. In addition, you can stagger the tic labels with the following command:

SET FACTOR PLOT LABEL DISPLACEMENT <NORMAL/STAGGERED/VALUE>

NORMAL means that all tic labels are plotted at a distance determined by the TIC LABEL DISPLACEMENT command. STAGGERED means that alternating plots will be staggered. That is, one will use the standard displacement while the next uses a staggered value. Entering this command with a numeric value specifies the amount of the displacement for the staggered tic labels. For example,

TIC MARK LABEL DISPLACEMENT 10
SET FACTOR PLOT LABEL DISPLACEMENT STAGGERED
SET FACTOR PLOT LABEL DISPLACEMENT 25

These commands specify that the default tic label displacement is 10 and the staggered tic mark label displacement is 25.

Note:
It is often helpful on scatter plot matrices to overlay a fitted line on the plots. The following command is used to specify the type of fit.

SET FACTOR PLOT FIT <NONE/LOWESS/LINE/QUAD/SMOOTH>

NONE means that no fitted line is plotted. LOWESS means that a locally weighted least squares line will be overlaid. LINE means that a linear fit (Y = A0 + A1*X) will be overlaid. QUAD means that a quadratic fit (Y = A0 + A1*X + A2*X**2) will be overlaid. SMOOTH means that a least squares smoothing will be overlaid.

For LOWESS, it is recommended that the lowess fraction be set fairly high (e.g., LOWESS FRACTION 0.6).

The fitted line is currently only generated if the factor plot type is PLOT.

The default is for no fitted line to be overlaid on the plot. If a overlaid fit is desired, the most common choice is to use LOWESS.

Note:
Dataplot allows you to set axis limits with the LIMITS command. For the factor plot, it is often desirable to set the axis limits for each plot. This can be done with the command

SET FACTOR PLOT YLIMITS <LOW1> <UPP1> <LOW2> <UPP2> ...
SET FACTOR PLOT XLIMITS <LOW1> <UPP1> <LOW2> <UPP2> ...

The default is to allow the axis limits to float with the data.

Note:
You can use standard plot control commands to control the appearance of the factor plot.

For example,

MULTIPLOT CORNER COORDINATES 5 5 95 95
MULTIPLOT SCALE FACTOR 3
TIC OFFSET UNITS SCREEN
TIC OFFSET 5 5
Default:
None
Synonyms:
None
Related Commands:
 FIT = Perform a multi-linear fit. PARTIAL RESIDUAL PLOT = Generates a partial residual plot. PARTIAL REGRESSION PLOT = Generates a partial regression plot. CCPR PLOT = Generates a CCPR plot. VIF = Compute variance inflation factors for a multi-linear fit. CONDITION INDICES = Compute condition indices for a design matrix. SCATTER PLOT MATIRX = Generate a factor plot. FACTOR PLOT = Generate a plot for a response against a number of different independent variables. CONDITIONAL PLOT = Generate a conditional (subset) plot.
References:
Tom Ryan (1997), "Modern Regression Methods", John Wiley.

Neter, Wasserman, and Kunter (1990), "Applied Linear Statistical Models", 3rd ed., Irwin.

Draper and Smith (1998), "Applied Regression Analysis", 3rd. ed., John Wiley.

Cook and Weisberg (1982), "Residuals and Influence in Regression", Chapman and Hall.

Belsley, Kuh, and Welsch (1980), "Regression Diagnostics", John Wiley.

Velleman and Welsch (1981), "Efficient Computing of Regression Diagnostiocs", The American Statistician, Vol. 35, No. 4, pp. 234-242.

Applications:
Multi-linear Regression
Implementation Date:
2002/6
Program:

SKIP 25
READ HALD647.DAT Y X1 X2 X3 X4
.
MULTIPLOT CORNER COORDINATES 5 5 95 95
MULTIPLOT SCALE FACTOR 2
LINE BLANK
CHARACTER X
X1LABEL DISPLACEMENT 12
Y1LABEL DISPLACEMENT 12
TIC OFFSET UNITS SCREEN
TIC OFFSET 5 5
.
MATRIX PARTIAL LEVERAGE PLOT Y X1 X2 X3 X4 NIST is an agency of the U.S. Commerce Department.

Date created: 8/19/2002
Last updated: 10/13/2015

Please email comments on this WWW page to alan.heckert@nist.gov.