 Dataplot Vol 1 Vol 2

# FACTOR PLOT

Name:
FACTOR PLOT
Type:
Graphics Command
Purpose:
Generates a factor plot. A factor plot is simply the same plot generated for different response and factor variables and arranged on a single page.

The underlying plot generated can be any univariate or bivariate plot. The scatter plot is the most common application.

Although factor plots can be generated using the MULTIPLOT command (and typically LOOPING), the FACTOR PLOT command allows some fairly involved multiplots to be generated with a minimum number of commands (and without looping).

Description:
A factor plot of Y X1, ... , Xk is a plot Y X1, Y X2, .... , Y Xk arranged on a single page. The parituclar plot can be any plot requiring 2 variables.

There are a couple of variations on this. If a univariate plot (e.g., a histogram) is being generated, then FACTOR PLOT Y1 Y2 ... Yk would generate HISTOGRAM Y1, HISTOGRAM Y2, ... HISTOGRAM Yk. The most general case would have multiple response and multiple factor variables. For example,

FACTOR PLOT Y1 Y2 Y3 X1 X2 X3 X4

would generate

```
col 1       col 2       col 3       col 4
row 1: PLOT Y1 X1, PLOT Y1 X2, PLOT Y1 X3, PLOT Y1 X4
row 2: PLOT Y2 X1, PLOT Y2 X2, PLOT Y2 X3, PLOT Y2 X4
row 3: PLOT Y3 X1, PLOT Y3 X2, PLOT Y3 X3, PLOT Y3 X4
row 4: PLOT Y4 X1, PLOT Y4 X2, PLOT Y4 X3, PLOT Y4 X4

```
There are a number of alternatives for the appearance of this plot. Dataplot tries to balance simplicity with flexibility by using default settings, but providing numerous SET commands to control the appearance of the plot. These are described in detail in the NOTES section below.
Syntax 1:
FACTOR PLOT <y1> <y2> ... <yk>             <SUBSET/EXCEPT/FOR qualification>
where <y1> through <yk> are the response variables;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

Up to 25 response variables can be specified. This syntax is used when generating a univariate plot.

Syntax 2:
FACTOR PLOT <y1> <x1> ... <xk>             <SUBSET/EXCEPT/FOR qualification>
where <y1> is the response variables;
<x1> through <xk> are the factor variables;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

This syntax generates PLOT Y1 X1, PLOT Y1 X2, etc. Up to 25 factor variables can be specified. This syntax is used when generating a bivariate plot. In this case, the response variable is constant, but the factor variable is changing.

Syntax 3:
FACTOR PLOT <y1> ... <yl> <x1> ... <xk>             <SUBSET/EXCEPT/FOR qualification>
where <y1> ... <yl> are the response variables;
<x1> through <xk> are the factor variables;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

This syntax generates a matrix of plots where the number of response variables determines the number of rows and the number of factor variables determines the number of columns. This syntax is used when generating a bivariate plot and there is more than one response variable and more than one factor variable.

Examples:
SET FACTOR PLOT TYPE HISTOGRAM
FACTOR PLOT Y1 Y2 Y3 Y4 Y5

SET FACTOR PLOT TYPE PLOT
FACTOR PLOT Y X1 X2 X3 X4 X5 SUBSET TAG > 2

SET FACTOR PLOT TYPE PLOT
SET FACTOR PLOT RESPONSE VARIABLES 3
FACTOR PLOT Y1 Y2 Y3 X1 X2 X3 X4

Note:
The concept of the factor plot generalizes quite nicely to any plot type for either one or two variables. Dataplot supports the factor plot for a number of different plot types. The type of plot generated is controlled by the following command:

SET FACTOR PLOT TYPE <value>

where <value> is one of the following.

The folllowing plot two variables (e.g., BIHISTOGRAM Y1 Y2). Use either syntax 2 or syntax 3 above, depending on whether you have one or multiple response variables, for the FACTOR PLOT command.

• PLOT - generate scatter plots (this is the default). The x and y axis labels are automatically set to the appropriate variable name.

• QUANTILE-QUANTILE - generate quantile-quantile plots. The x and y axis labels are automatically set to the appropriate variable name.

• BIHISTOGRAM - generate relative bihistograms. We recommend that you enter SET RELATIVE HISTOGRAM PERCENT to generate more consistent y-axis scales. The X1LABEL is set to the first variable name and the X2LABEL is set to the second variable name. If no YLABEL is already defined, the YLABEL is set to "Frequency".

• BOX-COX LINEARITY - generate Box-Cox linearity plots. If not previously defined, the X1LABEL is set to "Alpha" and the Y1LABEL is set to "Correlation". X2LABEL is set to the appropriate variable names.

• STATISTIC PLOT - generate a statistic plot (e.g., MEAN PLOT, STANDARD DEVIATION PLOT). To define which statistic is plotted, enter the command
SET FACTOR PLOT STATISTIC <name>
where <name> can be either one or two words. The list of supported statistics is the same as for the STATISTIC PLOT command. The x and y axis labels are automatically set to the appropriate variable name.

The folllowing plot one variables (e.g., HISTOGRAM Y1). Use syntax 1 above.

• HISTOGRAM - generate relative histograms. We recommend that you enter SET RELATIVE HISTOGRAM PERCENT to generate more consistent y-axis scales. The X1LABEL is set to the variable name. If no Y1LABEL is already defined, the Y1LABEL is set to "Frequency".

• KERNEL DENSITY - generate kernel density plots. The X1LABEL is set to the variable name. If no Y1LABEL is already defined, the Y1LABEL is set to "Density".

• PERCENT POINT PLOT - generate a percent point plot. The X1LABEL is set to "Percentile" and the X2LABEL is set to the variable name. No Y1LABEL is automatically set.

• AUTOCORRELATION - generate an autocorrelation plot. If not already defined, X1LABEL is set to "Lag", Y1LABEL is set to "Correlation" and the X2LABEL is set to the variable name.

• SPECTRAL - generate a spectral plot. If not already defined, X1LABEL is set to "Frequency", Y1LABEL is set to "Power" and the X2LABEL is set to the variable name.

• LAG - generate a lag plot. If not already defined, X1LABEL is set to "Frequency", Y1LABEL is set to "Power" and the X2LABEL is set to the variable name.

• RUN SEQUENCE PLOT - generate a run sequence plot. If not already defined, X1LABEL is set to "Sequence", Y1LABEL is not set, and the X2LABEL is set to the variable name.

• PROBABILITY PLOT - generate a probability plot for the distribution. can be up to 5 words and corresponds to the same names as supported by the PROBABILITY PLOT command (70+ distributions supported). If not already defined, X1LABEL is set to "Theoretical", Y1LABEL is set to "Data" and the X2LABEL is set to the variable name.

• PPCC PLOT - generate a ppcc plot for the distribution. can be up to 5 words and corresponds to the same names as supported by the PPCC PLOT command (30+ distributions supported). If not already defined, X1LABEL is set to "Parameter", Y1LABEL is set to "Correlation" and the X2LABEL is set to the variable name.

Dataplot automatically defines X1LABEL, X2LABEL, and YLABEL commands for these plots. You can control the attributes of these labels with the standard label setting commands. If you have defined variable labels (with the VARIABLE LABEL command), these will automatically be substituted for variable names in the labels.

If you have defined variable labels with the VARIABLE LABEL command and you want to suppress the automatic expansion of the variable name to the variable label, enter

SET VARIABLE LABEL EXPAND OFF

To restore the default that variable names will be expanded to the corresponding variable label, enter

SET VARIABLE LABEL EXPAND ON
Note:
The following option controls which axis tic marks, tic mark labels, and axis labels are plotted.

SET FACTOR PLOT LABELS <ON/OFF/XON/YON/BOX>

OFF means that all axis labels are suppressed (this can be useful if a large number of variables are being plotted). ON means that both X and Y axis labels are printed. XON only plots the x axis labels and YON only plots the y axis labels.

BOX is a special option that creates an extra column on the left and an extra row on the bottom. The axis label is printed in this box. BOX is typically reserved for the plot types that plot the variable names in the axes labels.

The default is ON (both x and y axis labels are printed).

Note:
The following option controls where the x axis tic marks, tic mark labels, and axis label are printed.

SET FACTOR PLOT X AXIS <BOTTOM/TOP/ALTERNATE>

BOTTOM specifies that the x axis labels are printed on the bottom axis (on the last row only). TOP specifies that the x axis labels are printed on the top axis (first row only). ALTERNATE specifies that the x axis labels alternate between the top (first row) and bottom axis (last row). We recommend using the TIC OFFSET command to avoid overlap of axis labels and tic marks.

The default is ALTERNATE.

Note:
The following option controls where the y axis tic marks, tic mark labels, and axis label are printed.

SET FACTOR PLOT Y AXIS <LEFT/RIGHT/ALTERNATE>

LEFT specifies that the y axis labels are printed on the left axis (on the first column only). RIGHT specifies that the y axis labels are printed on the right axis (last column only). ALTERNATE specifies that the y axis labels alternate between the left (first column) and right axis (last column). We recommend using the TIC OFFSET command to avoid overlap of axis labels and tic marks.

The default is ALTERNATE.

Note:
Users have different preferences in terms of whether the plot frames for neighboring plots are connected or not. This is controlled with the following option.

SET FACTOR PLOT FRAME <DEFAULT/CONNECTED/USER>

DEFAULT connects neighboring frames (i.e., the FRAME CORNER COORDINATES are set to 0 0 100 100). USER uses whatever frame coordinates are currently set (15 20 85 90 by default) and makes no special provisions for axis labels and tic marks (i.e., you set them as you normally would, each plot uses whatever you have set). CONNECTED uses whatever frame coordinates have been set by the user, but it draws the axis labels and tic marks as if DEFAULT were being used (that is, as determined by the SET FACTOR PLOT commands described above). Typically, CONNECTED is used to put a small bit of space between plots. For example, you might use FRAME CORNER COORDINATES 3 3 97 97 before the FACTOR PLOT command.

Since the plots can often have different limits for the axes, the default is USER.

Note:
When the tic marks and tic mark labels are all plotted on the same side (i.e., SET FACTOR PLOT Y AXIS is set to LEFT or RIGHT or SET FACTOR PLOT X AXIS is set to BOTTOM or TOP), then overlap between plots is possible. The TIC OFFSET command can be used to avoid this. In addition, you can stagger the tic labels with the following command:

SET FACTOR PLOT LABEL DISPLACEMENT <NORMAL/STAGGERED/VALUE>

NORMAL means that all tic labels are plotted at a distance determined by the TIC LABEL DISPLACEMENT command. STAGGERED means that alternating plots will be staggered. That is, one will use the standard displacement while the next uses a staggered value. Entering this command with a numeric value specifies the amount of the displacement for the staggered tic labels. For example,

TIC MARK LABEL DISPLACEMENT 10
SET FACTOR PLOT LABEL DISPLACEMENT STAGGERED
SET FACTOR PLOT LABEL DISPLACEMENT 25

These commands specify that the default tic label displacement is 10 and the staggered tic mark label displacement is 25.

Note:
It is often helpful on factor plots to overlay a fitted line on the plots. The following command is used to specify the type of fit.

NONE means that no fitted line is plotted. LOWESS means that a locally weighted least squares line will be overlaid. LINE means that a linear fit (Y = A0 + A1*X) will be overlaid. QUAD means that a quadratic fit (Y = A0 + A1*X + A2*X**2) will be overlaid. SMOOTH means that a least squares smoothing will be overlaid.

For LOWESS, it is recommended that the lowess fraction be set fairly high (e.g., LOWESS FRACTION 0.6).

The fitted line is currently only generated if the factor plot type is PLOT.

The default is for no fitted line to be overlaid on the plot. If a overlaid fit is desired, the most common choice is to use LOWESS.

Note:
In distinguishing syntax 2 and syntax 3 above, Dataplot needs to know how many response variables there are. This is specified with the command

SET FACTOR PLOT RESPONSE VARIABLES <value>

where <value> identifies the number of response variables. On the FACTOR PLOT command, Dataplot assummes that the response variables (y axis) come first, then the factor variables (x axis).

For the two variable plot types, the default is one. For the univariate plot types, all variables are assummed to be response variables.

Note:
Dataplot supports a special plot type

PLOT Y X TAG

In this form of the plot command, TAG is a group identifier variable. Points belonging to the same group are plotted with the same attributes (controlled by the CHARACTER and LINE commands and their various attribute setting commands).

Using a tag variable has two common purposes:

1. If your data has natural groups (e.g., batch 1 and batch 2).
2. To identify certain points. The most common application would be to flag outliers.

You can specify that the factor plot use the form of the PLOT command by using the command

SET FACTOR PLOT TAG <ON/OFF>

OFF specifies that the standard plot command (PLOT Y1 Y2) will be used. ON specifies that the last variable on the FACTOR PLOT command is a tag variable. That is, it is not plotted directly, but is instead the third variable on all the plot commands generated by the factor plot.

Currently, this command only applies if the factor plot plot type is set to PLOT.

In some cases, you may want to use a tag variable for both purposes. That is, you may have natural groups in your data, but you also want to flag certain outlying points. You can do this by using a SUBSET clauuse. For example,

LIMITS 0 120
SET FACTOR PLOT TAG ON
CHARACTER CIRCLE SQUARE TRIANGLE
CHARACTER FILL OFF OFF OFF
FACTOR PLOT Y X1 X2 X3 TAG SUBSET Y2 <= 100
PRE-ERASE OFF
CHARACTER FILL ON ON ON
FACTOR PLOT Y X1 X2 X3 TAG SUBSET Y2 > 100

The SET FACTOR PLOT LIMITS command, discussed below, can be used to control the axis limits for the individual plots.

The default is OFF.

Note:
Dataplot allows you to set axis limits with the LIMITS command. For the factor plot, it is often desirable to set the axis limits for each plot. This can be done with the command

SET FACTOR PLOT YLIMITS ...
SET FACTOR PLOT XLIMITS ...

Note that the pairs of limits correspond to the variable list in the FACTOR PLOT command. For univariate plot types, the plot order corresponds to the variable list. For bivariate plot types, the YLIMITS refer to the response variables and XLIMITS refer to the factor variables. That is, Dataplot determines which variable is being plotted on each axis, and gets the corresponding limits.

The default is to allow the axis limits to float with the data.

Note:
Dataplot supports a subregion capability. This is used to draw "engineering limits" on a plot. For a factor plot, if you specify engineering limits, you typically want these limits to vary with each plot. They can be specified with the command

SET FACTOR PLOT SUBREGION XLIMITS <LOW1> <UPP1> <LOW2> <UPP2> ...
SET FACTOR PLOT SUBREGION YLIMITS <LOW1> <UPP1> <LOW2> <UPP2> ...

This command is similar to the SET FACTOR PLOT XLIMITS and SET FACTOR PLOT YLIMITS commands in that the list corresponds to the variables entered on the FACTOR PLOT command.

Only one set of subregion limits can be set for each variable.

The default is that no subregion limits are set.

Note:
You can specify a special X2LABEL for the plots with the following command

SET FACTOR PLOT X2LABEL <OFF/ CORRELATION/PERCENT CORRELATION/EFFECT/ PERCENT ACCEPT/NUMBER ACCEPT/ACCEPT TOTAL>

where

• OFF - no special X2LABEL is drawn.
• CORRELATION - the correlation of the points on the plot is printed with the X2LABEL. This option is typically used with the plot type PLOT.
• PERCENT CORRELATION - this is the same as CORRELATION, except that the correlation is printed as a percent.
• EFFECT - the difference between the low and high value is printed. This option is typically used with the plot type DEX INTERACTION (and doesn't really make any sense with the other plot types). This plot type is supported for the SCATTER PLOT MATRIX, but not for the FACTOR PLOT.
• PERCENT ACCEPT - this option prints the percentage of points inside the first subregion. If no subregions are defined, this option makes no sense. It is typically used to specify the percentage of points within engineering limits.
• NUMBER ACCEPT - this option is similar to PERCENT ACCEPT. However, the number of points rather than the percentage is printed.
• ACCEPT TOTAL - this option is similar to NUMBER ACCEPT. However, it prints the number accepted first, then the total number of points.
• ACCEPT TOTAL PERCENT - this option is similar to ACCEPT TOTAL. However, after printing the number accepted and the total number, it prints the percentage accepted.

The following commands can be used to add a prefix and suffix to the X2LABEL. For example, you might want the PERCENT CORRELATION to append a "%" after the percent correlation and to start with "CORR = ".

SET X2LABEL PREFIX
SET X2LABEL SUFFIX

The appearance and location of the X2LABEL are controlled with the standard X2LABEL attribute setting commands.

There are occassions where you may want to use the values computed in the X2LABEL for additional numeric computations. These values are automatically written to the file "dpst5f.dat". The values are printed in the order the plots are generated.

You can control the number of digits printed with the SET WRITE DECIMALS command.

Note:
You can use standard plot control commands to control the appearance of the factor plot. For example,

MULTIPLOT CORNER COORDINATES 5 5 95 95
MULTIPLOT SCALE FACTOR 3
TIC OFFSET UNITS SCREEN
TIC OFFSET 5 5

is a fairly typical set of commands commonly used with factor plots.

Default:
None
Synonyms:
SCATTER PLOT is a synonym for FACTOR PLOT.
SET SCATTER PLOT is a synonym for SET FACTOR PLOT.
Related Commands:
 PLOT = Generates a data or function plot. SCATTER PLOT MATIRX = Generate a factor plot. CONDITIONAL PLOT = Generate a conditional (subset) plot.
Reference:
"Visualizing Data", Cleveland, William S., Hobart Press, 1993.

"Graphical Exploratory Data Analysis", du Toit, Steyn, and Stumpf, Springer-Verlang, 1986.

Applications:
Exploratory Data Analysis, Multivariate Data Analysis
Implementation Date:
2000/1
Program:
dimension 25 variables
skip 25
read simon1.dat y1 y2 x1 to x5 block runseq
.
multiplot scale factor 2
multiplot corner coordinates 10 5 90 90
tic offset units screen
xtic offset 5 10
major xtic mark number 3
ytic offset 5 5
y1label displacement 40
y2label displacement 25
x1label displacement 3
x2label displacement 7
char x
line blank
.
set factor plot frame type connected
frame corner coordinates 0 0 100 100
set factor plot response variables 2
factor plot y1 y2 x1 x2 x3 x4 x5

NIST is an agency of the U.S. Commerce Department.

Date created: 06/05/2001
Last updated: 06/07/2016