![]() |
SCATTER PLOT MATRIXName:
The pairwise plots need not be limited to scatter plots. Dataplot allows you to generate the pairwise plots for approximately 10 different plot types (and additional plot types will be added in future implementations). There are a number of alternatives for the appearance of this plot. Dataplot tries to balance simplicity with flexibility by using default settings, but providing numerous SET commands to control the appearance of the plot. These are described in detail in the NOTES section below.
<SUBSET/EXCEPT/FOR qualification> where <y1> through <yk> are the response variables; and where the <SUBSET/EXCEPT/FOR qualification> is optional. Up to 25 response variables can be specified.
<SUBSET/EXCEPT/FOR qualification> where <y1> through <yk> are the response variables; <tag> is a group id variable (and is always given last); and where the <SUBSET/EXCEPT/FOR qualification> is optional. This is a special form of the command that plots
<SUBSET/EXCEPT/FOR qualification> where <y1> through <yk> are the response variables; <stat> defines a statistic, such as MEAN or MEDIAN, for the plot; and where the <SUBSET/EXCEPT/FOR qualification> is optional. This is a special form of the command that plots
SCATTER PLOT MATRIX Y1 Y2 Y3 Y4 Y5 SUBSET TAG > 2
where <value> is one of the following. The folllowing plot two variables (e.g., BIHISTOGRAM Y1 Y2).
The folllowing plot Y X1 X2 (e.g., DEX CONTOUR PLOT Y X1 X2). That is, the response variable is the first variable in the list, and it remains constant for all the pairwise plots.
Dataplot automatically defines X1LABEL, X2LABEL, and YLABEL commands for these plots. You can control the attributes of these labels with the standard label setting commands. If you have defined variable labels (with the VARIABLE LABEL command), these will automatically be substituted for variable names in the labels. Additional plot types will be added in future releases.
OFF means that all axis labels are suppressed (this can be useful if a large number of variables are being plotted). ON means that both X and Y axis labels are printed. XON only plots the x axis labels and YON only plots the y axis labels. BOX is a special option that creates an extra column on the left and an extra row on the bottom. The axis label is printed in this box. The default is ON (both x and y axis labels are printed).
BOTTOM specifies that the x axis labels are printed on the bottom axis (on the last row only). TOP specifies that the x axis labels are printed on the top axis (first row only). ALTERNATE specifies that the x axis labels alternate between the top (first row) and bottom axis (last row). We recommend using the TIC OFFSET command to avoid overlap of axis labels and tic marks. The default is ALTERNATE.
LEFT specifies that the y axis labels are printed on the left axis (on the first column only). RIGHT specifies that the y axis labels are printed on the right axis (last column only). ALTERNATE specifies that the y axis labels alternate between the left (first column) and right axis (last column). We recommend using the TIC OFFSET command to avoid overlap of axis labels and tic marks. The default is ALTERNATE.
DEFAULT connects neighboring frames (i.e., the FRAME CORNER
COORDINATES are set to 0 0 100 100). USER uses whatever
frame coordinates are currently set (15 20 85 90 by default)
and makes no special provisions for axis labels and tic marks
(i.e., you set them as you normally would, each plot uses
whatever you have set). CONNECTED uses whatever frame
coordinates have been set by the user, but it draws the axis
labels and tic marks as if DEFAULT were being used (that is, as
determined by the SET SCATTER PLOT MATRIX
The default is DEFAULT.
NORMAL means that all tic labels are plotted at a distance determined by the TIC LABEL DISPLACEMENT command. STAGGERED means that alternating plots will be staggered. That is, one will use the standard displacement while the next uses a staggered value. Entering this command with a numeric value specifies the amount of the displacement for the staggered tic labels. For example,
SET SCATTER PLOT MATRIX LABEL DISPLACEMENT STAGGERED SET SCATTER PLOT MATRIX LABEL DISPLACEMENT 25 These commands specify that the default tic label displacement is 10 and the staggered tic mark label displacement is 25.
NONE means that no fitted line is plotted. LOWESS means that a locally weighted least squares line will be overlaid. LINE means that a linear fit (Y = A0 + A1*X) will be overlaid. QUAD means that a quadratic fit (Y = A0 + A1*X + A2*X**2) will be overlaid. SMOOTH means that a least squares smoothing will be overlaid. For LOWESS, it is recommended that the lowess fraction be set fairly high (e.g., LOWESS FRACTION 0.6). The fitted line is currently only generated if the scatter plot matrix plot type is PLOT. The default is for no fitted line to be overlaid on the plot. If a overlaid fit is desired, the most common choice is to use LOWESS.
In this form of the plot command, TAG is a group identifier variable. Points belonging to the same group are plotted with the same attributes (controlled by the CHARACTER and LINE commands and their various attribute setting commands). Using a tag variable has two common purposes:
OFF specifies that the standard plot command (PLOT Y1 Y2) will be used. ON specifies that the last variable on the SCATTER PLOT MATRIX command is a tag variable. That is, it is not plotted directly, but is instead the third variable on all the plot commands generated by the scatter plot matrix. Currently, this command only applies if the scatter plot matrix plot type is set to PLOT. This form is common enough that the command (see Syntax 2)
implements this automatically. That is, YOUDEN MATRIX PLOT is equivalent to
SCATTER PLOT MATRIX Y1 Y2 ... YK TAG In some cases, you may want to use a tag variable for both purposes. That is, you may have natural groups in your data, but you also want to flag certain outlying points. You can do this by using a SUBSET clauuse. For example,
SET SCATTER PLOT MATRIX TAG ON CHARACTER CIRCLE SQUARE TRIANGLE CHARACTER FILL OFF OFF OFF SCATTER PLOT MATRIX Y1 Y2 Y3 Y4 TAG SUBSET Y2 <= 100 PRE-ERASE OFF CHARACTER FILL ON ON ON SCATTER PLOT MATRIX Y1 Y2 Y3 Y4 TAG SUBSET Y2 > 100 The SET SCATTER PLOT MATRIX LIMITS command, discussed below, can be used to control the axis limits for the individual plots. The default is OFF.
Note that the pairs of limits correspond to the variable list in the SCATTER PLOT MATRIX command. That is, if Y3 is the third variable in the command, Dataplot will set the YLIMITS when Y3 is plotted on the y axis and the XLIMITS when Y3 is plotted on the x axis. This command is particularly useful if you want to overlay scatter plot matrices (the example discussed for the SET SCATTER PLOT MATRIX TAG command gives an example of where you might want to do this). The default is to allow the axis limits to float with the data.
This command is similar to the SET SCATTER PLOT MATRIX LIMITS command in that the list corresponds to the variables entered on the SCATTER PLOT MATRIX command. Only one set of subregion limits can be set for each variable. The default is that no subregion limits are set.
If BLANK, an empty plot is generated and the variable label is plotted in the center of the empty plot. If LINE, a PLOT Y1 Y1 is generated (this will simply be a 45 degree line, but it does give some indication of the univariate distribution of the variable). If HISTOGRAM, a relative histogram of the variable is generated. For the HISTOGRAM, the axis labels do not apply to the histogram plot. A relative histogram is drawn to make comparisons more meaningful. If BOXPLOT, a box plot of the variable is generated. The BOXPLOT only applies if the SET SCATTER PLOT MATRIX TAG ON command is entered. That is, the box plot is only used if there are groups in the data. For the box plot, the y axis limtis are valid, but the x axis limits are not. This command only applies if the scatter plot matrix plot type is PLOT, CROSS TABULATE, or DEX CONTOUR. The default is BLANK.
If OFF, the plots below the diagonal are omitted. If ON, the plots below the diagonal are drawn. The default is ON.
where
The appearance and location of the X2LABEL are controlled with the standard X2LABEL attribute setting commands. There are occassions where you may want to use the values computed in the X2LABEL for additional numeric computations. These values are automatically written to the file "dpst5f.dat". The values are printed in the order the plots are generated.
For example,
MULTIPLOT SCALE FACTOR 3 TIC OFFSET UNITS SCREEN TIC OFFSET 5 5 is a fairly typical set of commands commonly used with scatter plot matrices.
SET MATRIX PLOT is a synonym for SET SCATTER PLOT MATRIX.
"Graphical Exploratory Data Analysis", du Toit, Steyn, and Stumpf, Springer-Verlang, 1986.
. A basic example of a scatter plot matrix skip 25 read iris.dat y1 y2 y3 y4 tag multiplot corner coordinates 10 10 90 90 multiplot scale factor 2 tic offset units screen tic offset 5 5 line blank blank blank character 1 2 3 set matrix plot tag on matrix plot y1 y2 y3 y4 tag move 50 95 justification center text Fisher Iris Data ![]()
Date created: 6/5/2001 |