 Dataplot Vol 1 Vol 2

# CONDITION PLOT

Name:
CONDITION PLOT
Type:
Graphics Command
Purpose:
Generates a condition plot. A condition plot is a plot of

PLOT Y X

conditional on the value of a third variable.

Although condition plots can be generated using the MULTIPLOT and SUBSET commands (and typically LOOPING), the CONDITION PLOT command allows some fairly involved multiplots to be generated with a minimum number of commands (and without looping).

Description:
A condition plot of Y X TAG is a plot of Y versus X for the distinct values of TAG arranged on a single page. That is, if TAG can have the value 1, 2, or 3, then

CONDITION PLOT Y X TAG

is equivalent to

MULTIPLOT 2 2
PLOT Y X TAG SUBSET TAG = 1
PLOT Y X TAG SUBSET TAG = 2
PLOT Y X TAG SUBSET TAG = 3
END OF MULTIPLOT

In the above, TAG is referred to as the conditioning variable. Dataplot expects the conditioning variable to be a discrete variable (i.e., it takes on a small number of distinct values). If you want to condition on a continuous variable, as is often the case, Dataplot provides several methods for doing this.

You can use the "CODE" commands:

LET TAG = CODE2 Y
LET TAG = CODE4 Y
LET TAG = CODE8 Y
LET TAG = CODE Y ( = 3, 5, 6, 7, 9, or 10)

CODE2 codes TAG as 1 or 2 depending on whether the corresponding points in Y fall below or above the median. CODE4 divides the data into quartiles (and code TAG as 1, 2, 3, 4 according to which quartile the data falls into). Similarly, CODE8 divides Y into octiles. CODE divides Y into percentiles and defines TAG according to which of the percentiles the data falls into.

For maximum control, you can do something like the following:

LET N = SIZE Y
LET TAG = 1 FOR I = 1 1 N
LET TAG = 2 SUBSET Y = 25 TO 50
LET TAG = 3 SUBSET Y = 51 TO 100
LET TAG = 4 SUBSET Y = 101 TO 150

This concept generalizes to other types of plots other than scatter plots. Dataplot supports the CONDITION PLOT for a number of univariate and bivariate plots.

There are a number of alternatives for the appearance of this plot. Dataplot tries to balance simplicity with flexibility by using default settings, but providing numerous SET commands to control the appearance of the plot. These are described in detail in the NOTES section below.

The CONDITION PLOT is similar to the factor plot. A simple explanation of the difference is that the factor plot does something like

MULTIPLOT 2 2
PLOT Y X1
PLOT Y X2
PLOT Y X3
PLOT Y X4
END OF MULTIPLOT

while the condition plot does something like

MULTIPLOT 2 2
PLOT Y X SUBSET TAG = 1
PLOT Y X SUBSET TAG = 2
PLOT Y X SUBSET TAG = 3
PLOT Y X SUBSET TAG = 4
END OF MULTIPLOT
Syntax 1:
CONDITION PLOT <Y> <X> <TAG> <SUBSET/EXCEPT/FOR qualification>
where <Y> is the response (y axis) variable;
<X> is the factor (x axis) variable;
<TAG> is the conditioning variable;
and where the <SUBSET/EXCEPT/FOR qualification&gr; is optional.

This syntax is used when generating a bivariate plot.

Syntax 2:
CONDITION PLOT <Y1> ... <Y> <X> <TAG> <SUBSET/EXCEPT/FOR qualification>
where <Y1> ... <Yk> are the response (y axis) variables;
<X> is the factor (x axis) variable;
<TAG> is the conditioning variable;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

This syntax is used when generating a bivariate plot with multiple response variables. That is, we are effectively doing

CONDITION PLOT Y1 X TAG
CONDITION PLOT Y2 X TAG
etc.

where each response variable is plotted as a single row of plots.

Syntax 3:
CONDITION PLOT <Y> <X> <TAG1> <TAG2> <SUBSET/EXCEPT/FOR qualification>
where <Y> is the response (y axis) variable;
<X> is the factor (x axis) variable;
<TAG1> is the first conditioning variable;
<TAG2> is the second conditioning variable;
and where the <SUBSET/EXCEPT/FOR qualification&gr; is optional.

This syntax is used when generating a bivariate plot with two conditioning variables. The rows of the plot matrix correspond to the distinct values of the first tag variable while the columns correspond to the distinct values of the second tag variable.

The most general case, multiple response variables with two conditioning variables, is still being tested.

Syntax 4:
CONDITION PLOT <Y> <TAG> <SUBSET/EXCEPT/FOR qualification>
where <Y> is the response (y axis) variable;
<TAG> is the conditioning variable;
and where the <SUBSET/EXCEPT/FOR qualification&gr; is optional.

This syntax is used when generating a univariate plot.

Syntax 5:
CONDITION PLOT <Y1> ... <Y> <TAG> <SUBSET/EXCEPT/FOR qualification>
where <Y1> ... <Yk> are the response (y axis) variables;
<TAG> is the conditioning variable;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

This syntax is used when generating a univariate plot with multiple response variables. That is, we are effectively doing

CONDITION PLOT Y1 TAG
CONDITION PLOT Y2 TAG
&etc.

where each response variable is plotted as a single row of plots.

Syntax 6:
CONDITION PLOT <Y> <TAG1> <TAG2> <SUBSET/EXCEPT/FOR qualification>
where <Y> is the response (y axis) variable;
<TAG1> is the first conditioning variable;
<TAG2> is the second conditioning variable;
and where the <SUBSET/EXCEPT/FOR qualification&gr; is optional.

This syntax is used when generating a univariate plot with two conditioning variables. The rows of the plot matrix correspond to the distinct values of the first tag variable while the columns correspond to the distinct values of the second tag variable.

The most general case, multiple response variables with two conditioning variables, is still being tested.

Examples:
SET CONDITION PLOT TYPE HISTOGRAM
CONDITION PLOT Y TAG

SET CONDITION PLOT TYPE PLOT
CONDITION PLOT Y X TAG

SET CONDITION PLOT TYPE PLOT
CONDITION PLOT Y X TAG SUBSET TAG > 2

Note:
The concept of the condition plot generalizes quite nicely to any plot type for either one or two variables. Dataplot supports the condition plot for a number of different plot types. The type of plot generated is controlled by the following command:

SET CONDITION PLOT TYPE

where is one of the following.

The folllowing plot two variables (e.g., BIHISTOGRAM Y1 Y2). Use either syntax 2 or syntax 3 above, depending on whether you have one or multiple response variables, for the CONDITION PLOT command.

• PLOT - generate scatter plots (this is the default). The x and y axis labels are automatically set to the appropriate variable name.

• QUANTILE-QUANTILE - generate quantile-quantile plots. The x and y axis labels are automatically set to the appropriate variable name.

• BIHISTOGRAM - generate relative bihistograms. We recommend that you enter SET RELATIVE HISTOGRAM PERCENT to generate more consistent y-axis scales. The X1LABEL is set to the first variable name and the X2LABEL is set to the second variable name. If no YLABEL is already defined, the YLABEL is set to "Frequency".

• BOX-COX LINEARITY - generate Box-Cox linearity plots. If not previously defined, the X1LABEL is set to "Alpha" and the Y1LABEL is set to "Correlation". X2LABEL is set to the appropriate variable names.

• STATISTIC PLOT - generate a statistic plot (e.g., MEAN PLOT, STANDARD DEVIATION PLOT). To define which statistic is plotted, enter the command
SET CONDITION PLOT STATISTIC <name>
where <name> can be either one or two words. The list of supported statistics is the same as for the STATISTIC PLOT command. The x and y axis labels are automatically set to the appropriate variable name.

The folllowing plot one variables (e.g., HISTOGRAM Y1). Use syntax 1 above.

• HISTOGRAM - generate relative histograms. We recommend that you enter SET RELATIVE HISTOGRAM PERCENT to generate more consistent y-axis scales. The X1LABEL is set to the variable name. If no Y1LABEL is already defined, the Y1LABEL is set to "Frequency".

• PERCENT POINT PLOT - generate a percent point plot. The X1LABEL is set to "Percentile" and the X2LABEL is set to the variable name. No Y1LABEL is automatically set.

• AUTOCORRELATION - generate an autocorrelation plot. If not already defined, X1LABEL is set to "Lag", Y1LABEL is set to "Correlation" and the X2LABEL is set to the variable name.

• SPECTRAL - generate a spectral plot. If not already defined, X1LABEL is set to "Frequency", Y1LABEL is set to "Power" and the X2LABEL is set to the variable name.

• LAG - generate a lag plot. If not already defined, X1LABEL is set to "Frequency", Y1LABEL is set to "Power" and the X2LABEL is set to the variable name.

• RUN SEQUENCE PLOT - generate a run sequence plot. If not already defined, X1LABEL is set to "Sequence", Y1LABEL is not set, and the X2LABEL is set to the variable name.

• PROBABILITY PLOT - generate a probability plot for the distribution. can be up to 5 words and corresponds to the same names as supported by the PROBABILITY PLOT command (70+ distributions supported). If not already defined, X1LABEL is set to "Theoretical", Y1LABEL is set to "Data" and the X2LABEL is set to the variable name.

• PPCC PLOT - generate a ppcc plot for the distribution. can be up to 5 words and corresponds to the same names as supported by the PPCC PLOT command (30+ distributions supported). If not already defined, X1LABEL is set to "Parameter", Y1LABEL is set to "Correlation" and the X2LABEL is set to the variable name.

The plot TITLE identifies the value of the conditioning variable for all of the above plot types.

Where it makes sense, Dataplot will generate a dummy plot of the full data set first in order to generate common x and y axis scales. For some plots, this does not make sense. For example, a full sample PPCC plot does not necessarily encompass the range of the PPCC plots generated from subsets of the data.

Dataplot automatically defines X1LABEL, X2LABEL, and YLABEL commands for these plots. You can control the attributes of these labels with the standard label setting commands. If you have defined variable labels (with the VARIABLE LABEL command), these will automatically be substituted for variable names in the labels.

If you have defined variable labels with the VARIABLE LABEL command and you want to suppress the automatic expansion of the variable name to the variable label, enter

SET VARIABLE LABEL EXPAND OFF

To restore the default that variable names will be expanded to the corresponding variable label, enter

SET VARIABLE LABEL EXPAND ON
Note:
In distinguishing the different syntaxes above, Dataplot needs to know how many response variables and how many conditioning variables there are. The presence of an x axis variable is automatically determined from the plot type. The number of response and conditioning variables is specified with the commands

SET CONDITION PLOT RESPONSE VARIABLES <value1>
SET CONDITION PLOT CONDITION VARIABLES <value2>

where <value1> identifies the number of response variables and <value2> identifies the number of conditioning variables. On the CONDITION PLOT command, Dataplot assummes that the response variables (y axis) come first, then the factor variable (x axis) if needed, and then the conditioning variables.

The default is one response variable and one conditioning variable.

Note:
The following option controls which axis tic marks, tic mark labels, and axis labels are plotted.

SET CONDITION PLOT LABELS

OFF means that all axis labels are suppressed (this can be useful if a large number of variables are being plotted). ON means that both X and Y axis labels are printed. XON only plots the x axis labels and YON only plots the y axis labels.

BOX is a special option that creates an extra column on the left and an extra row on the bottom. The axis label is printed in this box. BOX is typically reserved for the case where there is a natural division of rows and columns (i.e., either multiple response variables or two conditioning variables).

The default is ON (both x and y axis labels are printed).

Note:
The following option controls where the x axis tic marks, tic mark labels, and axis label are printed.

SET CONDITION PLOT X AXIS

BOTTOM specifies that the x axis labels are printed on the bottom axis (on the last row only). TOP specifies that the x axis labels are printed on the top axis (first row only). ALTERNATE specifies that the x axis labels alternate between the top (first row) and bottom axis (last row). We recommend using the TIC OFFSET command to avoid overlap of axis labels and tic marks.

The default is ALTERNATE.

Note:
The following option controls where the y axis tic marks, tic mark labels, and axis label are printed.

SET CONDITION PLOT Y AXIS

LEFT specifies that the y axis labels are printed on the left axis (on the first column only). RIGHT specifies that the y axis labels are printed on the right axis (last column only). ALTERNATE specifies that the y axis labels alternate between the left (first column) and right axis (last column). We recommend using the TIC OFFSET command to avoid overlap of axis labels and tic marks.

The default is ALTERNATE.

Note:
Users have different preferences in terms of whether the plot frames for neighboring plots are connected or not. This is controlled with the following option.

SET CONDITION PLOT FRAME

DEFAULT connects neighboring frames (i.e., the FRAME CORNER COORDINATES are set to 0 0 100 100). USER uses whatever frame coordinates are currently set (15 20 85 90 by default) and makes no special provisions for axis labels and tic marks (i.e., you set them as you normally would, each plot uses whatever you have set). CONNECTED uses whatever frame coordinates have been set by the user, but it draws the axis labels and tic marks as if DEFAULT were being used (that is, as determined by the SET CONDITION PLOT commands described above). Typically, CONNECTED is used to put a small bit of space between plots. For example, you might use FRAME CORNER COORDINATES 3 3 97 97 before the CONDITION PLOT command.

The default is DEFAULT.

Note:
When the tic marks and tic mark labels are all plotted on the same side (i.e., SET CONDITION PLOT Y AXIS is set to LEFT or RIGHT or SET CONDITION PLOT X AXIS is set to BOTTOM or TOP), then overlap between plots is possible. The TIC OFFSET command can be used to avoid this. In addition, you can stagger the tic labels with the following command:

SET CONDITION PLOT LABEL DISPLACEMENT

NORMAL means that all tic labels are plotted at a distance determined by the TIC LABEL DISPLACEMENT command. STAGGERED means that alternating plots will be staggered. That is, one will use the standard displacement while the next uses a staggered value. Entering this command with a numeric value specifies the amount of the displacement for the staggered tic labels. For example,

TIC MARK LABEL DISPLACEMENT 10
SET CONDITION PLOT LABEL DISPLACEMENT STAGGERED
SET CONDITION PLOT LABEL DISPLACEMENT 25

These commands specify that the default tic label displacement is 10 and the staggered tic mark label displacement is 25.

Note:
It is often helpful on scatter plot matrices to overlay a fitted line on the plots. The following command is used to specify the type of fit.

SET CONDITION PLOT FIT

NONE means that no fitted line is plotted. LOWESS means that a locally weighted least squares line will be overlaid. LINE means that a linear fit (Y = A0 + A1*X) will be overlaid. QUAD means that a quadratic fit (Y = A0 + A1*X + A2*X**2) will be overlaid. SMOOTH means that a least squares smoothing will be overlaid.

For LOWESS, it is recommended that the lowess fraction be set fairly high (e.g., LOWESS FRACTION 0.6).

The fitted line is currently only generated if the condition plot type is PLOT.

The default is for no fitted line to be overlaid on the plot. If a overlaid fit is desired, the most common choice is to use LOWESS.

Note:
Dataplot supports a special plot type

PLOT Y X TAG

In this form of the plot command, TAG is a group identifier variable. Points belonging to the same group are plotted with the same attributes (controlled by the CHARACTER and LINE commands and their various attribute setting commands).

Using a tag variable has two common purposes:

1. If your data has natural groups (e.g., batch 1 and batch 2).
2. To identify certain points. The most common application would be to flag outliers.

You can specify that the condition plot use the form of the PLOT command by using the command

SET CONDITION PLOT TAG

OFF specifies that the standard plot command (PLOT Y1 Y2) will be used. ON specifies that the last variable on the CONDITION PLOT command is a tag variable. That is, it is not plotted directly, but is instead the third variable on all the plot commands generated by the condition plot. Note that this tag variable is in addition to the conditioning variable.

Currently, this command only applies if the condition plot plot type is set to PLOT.

In effect, you can use the SET CONDITION PLOT TAG ON to identify groups in the data (whether it be a natural group variable or a created group to idenitfy a few specific points such as outliers) while conditioning on another, possibly continuous, variable.

The default is OFF.

Note:
Dataplot allows you to set axis limits with the LIMITS command. For the condition plot, it is often desirable to set the axis limits for each plot. This can be done with the command

SET CONDITION PLOT YLIMITS <LOW1> <UPP1> <LOW2> <UPP2> ...
SET CONDITION PLOT XLIMITS <LOW1> <UPP1> <LOW2> <UPP2> ...

Note that the pairs of limits correspond to the variable list in the CONDITION PLOT command. For univariate plot types, the plot order corresponds to the variable list. For bivariate plot types, the YLIMITS refer to the response variables and XLIMITS refer to the factor variables. That is, Dataplot determines which variable is being plotted on each axis, and gets the corresponding limits.

The default is to allow the axis limits to float with the data.

Note:
Dataplot supports a subregion capability. This is used to draw "engineering limits" on a plot. For a condition plot, if you specify engineering limits, you typically want these limits to vary with each plot. They can be specified with the command

SET CONDITION PLOT SUBREGION XLIMITS <LOW1> <UPP1> <LOW2> <UPP2> ...
SET CONDITION PLOT SUBREGION YLIMITS <LOW1> <UPP1> <LOW2> <UPP2> ...

This command is similar to the SET CONDITION PLOT XLIMITS and SET CONDITION PLOT YLIMITS commands in that the list corresponds to the variables entered on the CONDITION PLOT command.

Only one set of subregion limits can be set for each variable.

The default is that no subregion limits are set.

Note:
You can specify a special X2LABEL for the plots with the following command

SET CONDITION PLOT X2LABEL <OFF/ CORRELATION/PERCENT CORRELATION/EFFECT/ PERCENT ACCEPT/NUMBER ACCEPT/ACCEPT TOTAL>

where

• OFF - no special X2LABEL is drawn.
• CORRELATION - the correlation of the points on the plot is printed with the X2LABEL. This option is typically used with the plot type PLOT.
• PERCENT CORRELATION - this is the same as CORRELATION, except that the correlation is printed as a percent.
• EFFECT - the difference between the low and high value is printed. This option is typically used with the plot type DEX INTERACTION (and doesn't really make any sense with the other plot types). This plot type is supported for the SCATTER PLOT MATRIX, but not for the CONDITION PLOT.
• PERCENT ACCEPT - this option prints the percentage of points inside the first subregion. If no subregions are defined, this option makes no sense. It is typically used to specify the percentage of points within engineering limits.
• NUMBER ACCEPT - this option is similar to PERCENT ACCEPT. However, the number of points rather than the percentage is printed.
• ACCEPT TOTAL - this option is similar to NUMBER ACCEPT. However, it prints the number accepted first, then the total number of points.
• ACCEPT TOTAL PERCENT - this option is similar to ACCEPT TOTAL. However, after printing the number accepted and the total number, it prints the percentage accepted.

The following commands can be used to add a prefix and suffix to the X2LABEL. For example, you might want the PERCENT CORRELATION to append a "%" after the percent correlation and to start with "CORR = ".

SET X2LABEL PREFIX <prefix>
SET X2LABEL SUFFIX <suffix>

The appearance and location of the X2LABEL are controlled with the standard X2LABEL attribute setting commands.

There are occassions where you may want to use the values computed in the X2LABEL for additional numeric computations. These values are automatically written to the file "dpst5f.dat". The values are printed in the order the plots are generated.

The number of decimals printed in the number can be controlled using the SET WRITE DECIMALS command. by

Note:
You can use standard plot control commands to control the appearance of the condition plot.

For example,

MULTIPLOT CORNER COORDINATES 5 5 95 95 MULTIPLOT SCALE FACTOR 3 TIC OFFSET UNITS SCREEN TIC OFFSET 5 5

is a fairly typical set of commands commonly used with condition plots.

Default:
None
Synonyms:
SUBSET PLOT is a synonym for CONDITION PLOT. SET SUBSET PLOT is a synonym for SET CONDITION PLOT.
Related Commands:
 PLOT = Generates a data or function plot. SCATTER PLOT MATIRX = Generate a condition plot. FACTOR PLOT = Generate a factor plot.
Reference:
"Visualizing Data", Cleveland, William S., Hobart Press, 1993.

"Graphical Exploratory Data Analysis", du Toit, Steyn, and Stumpf, Springer-Verlang, 1986.

Applications:
Exploratory Data Analysis, Multivariate Data Analysis
Implementation Date:
2000/1
Program:
dimension 25 variables
skip 25
read iris.dat y1 y2 y3 y4 group
.
multiplot corner coordinates 10 5 90 90
tic offset units screen
xtic offset 5 10
ytic offset 5 5
char x
line blank
.
condition plot y3 y4 group

NIST is an agency of the U.S. Commerce Department.

Date created: 06/05/2001
Last updated: 06/07/2016

Please email comments on this WWW page to alan.heckert@nist.gov.