SED navigation bar go to SED home page go to Dataplot home page go to NIST home page SED Home Page SED Contacts SED Projects SED Products and Publications Search SED Pages
Dataplot Vol 1 Auxiliary Chapter

SCATTER PLOT MATRIX

Name:
    SCATTER PLOT MATRIX
Type:
    Graphics Command
Purpose:
    Generates a scatter plot matrix.
Description:
    A scatter plot matrix of Y1, Y2, ... , Yk is a matrix of all the pairwise scatter plots between Y1, Y2, ...., Yk. This is a simple, but powerful, technique for viewing all the pairwise relationships between the variables.

    The pairwise plots need not be limited to scatter plots. Dataplot allows you to generate the pairwise plots for approximately 10 different plot types (and additional plot types will be added in future implementations).

    There are a number of alternatives for the appearance of this plot. Dataplot tries to balance simplicity with flexibility by using default settings, but providing numerous SET commands to control the appearance of the plot. These are described in detail in the NOTES section below.

Syntax 1:
    SCATTER PLOT MATRIX <y1> <y2> ... <yk>
                            <SUBSET/EXCEPT/FOR qualification>
    where <y1> through <yk> are the response variables;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    Up to 25 response variables can be specified.

Syntax 2:
    YOUDEN MATRIX PLOT <y1> <y2> ... <yk> <tag>
                            <SUBSET/EXCEPT/FOR qualification>
    where <y1> through <yk> are the response variables;
                <tag> is a group id variable (and is always given last);
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    This is a special form of the command that plots

      PLOT Yi Yj TAG
    for the individual plots.
Syntax 3:
    DEX <stat> INTERACTION PLOT <y1> <y2> ... <yk>
                            <SUBSET/EXCEPT/FOR qualification>
    where <y1> through <yk> are the response variables;
                <stat> defines a statistic, such as MEAN or MEDIAN, for the plot;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    This is a special form of the command that plots

      DEX <stat> INTERACTION PLOT Yi Yj TAG
    for the individual plots. <stat> is optional. If no statistic is specified, an INTERACTION PLOT of the raw data is generated.
Examples:
    SCATTER PLOT MATRIX Y1 Y2 Y3 Y4 Y5
    SCATTER PLOT MATRIX Y1 Y2 Y3 Y4 Y5 SUBSET TAG > 2
Note:
    The concept of the scatter plot matrix generalizes quite nicely to any plot type for two variables. Dataplot supports the scatter plot matrix for a number of different plot types. The type of plot generated is controlled by the following command:

      SET SCATTER PLOT MATRIX TYPE <value>

    where <value> is one of the following.

    The folllowing plot two variables (e.g., BIHISTOGRAM Y1 Y2).

      PLOT generate scatter plots (this is the default). The x and y axis labels are automatically set to the appropriate variable name.
      QUANTILE-QUANTILE generate quantile-quantile plots. This degenerates to percent point plots on the diagonal. The x and y axis labels are automatically set to the appropriate variable name.
      BIHISTOGRAM generate relative bihistograms. This degenerates to relative histograms on the diagonal. We recommend that you enter SET RELATIVE HISTOGRAM PERCENT to generate more consistent y-axis scales. The X1LABEL is set to the first variable name and the X2LABEL is set to the second variable name. If no YLABEL is already defined, the YLABEL is set to "Frequency".
      CORRELATION generate a cross-correlation plot. This degenerates to an autocorrelation plot on the diagonal. If X1LABEL and Y1LABEL are not previously defined, they are automatically set to "Lag" and "Correlation" respectively. The X2LABEL is set to "X1*X2" where X1 and X2 represent the variable names.
      LAG generate a cross-lag plot. This degenerates to a lag plot on the diagonal. If X1LABEL and Y1LABEL are not previously defined, they are automatically set to "I+1" and "I" respectively. The X2LABEL is set to "X1*X2" where X1 and X2 represent the variable names.
      SPECTRAL generate a cross-spectral plot. This degenerates to a spectral plot on the diagonal. If X1LABEL and Y1LABEL are not previously defined, they are automatically set to "Power" and "Frequency" respectively. The X2LABEL is set to "X1*X2" where X1 and X2 represent the variable names.
      YOUDEN this generates a Youden plot. That is, PLOT Y1 Y2 TAG where TAG is a group id variable. The TAG variable is the last variable listed on the SCATTER PLOT MATRIX command and is the same for all the plots.

    The folllowing plot Y X1 X2 (e.g., DEX CONTOUR PLOT Y X1 X2). That is, the response variable is the first variable in the list, and it remains constant for all the pairwise plots.

      DEX CONTOUR this generates a dex contour plot. The diagonal plot is empty. The X2LABEL is set to "X1*X2" where X1 and X2 represent the variable names. No automatic labels are generated for X1LABEL and Y1LABEL.
      DEX INTERACTION generate a DEX INTERACTION PLOT. The X1LABEL is set to "X1*X2" where X1 and X2 represent the variable names. No automatic labels are generated for X2LABEL and Y1LABEL.
      DEX INTERACTION generate a DEX <statistic> INTERACTION PLOT. The X1LABEL is set to "X1*X2" where X1 and X2 represent the variable names. No automatic labels are generated for X2LABEL and Y1LABEL.
      CROSS TABULATE <statistic> generate a CROSS TABULATE <statistic> plot. This generates to a <statistic> STATISTIC plot on the diagonal. The X1LABEL is set to "X1*X2" where X1 and X2 represent the variable names. No automatic labels are generated for X2LABEL and Y1LABEL.

      If no <statistic> is given, then a special form of the CROSS TABULATE PLOT is generated. For this case, there is no response variable (i.e., CROSS TABULATE PLOT X1 X2 as oppossed to CROSS TABULATE MEAN PLOT Y X1 X2). The X1LABEL and Y1LABEL are set to the appropriate variable name.

    A few of the above plots support a <statistic> option. This can be one of 30+ supported statistics (the supported statistics are identical to those for the STATISTIC PLOT and the BOOTSTRAP PLOT). It is typically a location statistic (e.g., MEAN, MEDIAN) or a scale statistic (e.g., STANDARD DEVIATION, VARIANCE, MAD).

    Dataplot automatically defines X1LABEL, X2LABEL, and YLABEL commands for these plots. You can control the attributes of these labels with the standard label setting commands. If you have defined variable labels (with the VARIABLE LABEL command), these will automatically be substituted for variable names in the labels.

    Additional plot types will be added in future releases.

Note:
    The following option controls which axis tic marks, tic mark labels, and axis labels are plotted.

      SET SCATTER PLOT MATRIX LABELS <ON/OFF/XON/YON/BOX>

    OFF means that all axis labels are suppressed (this can be useful if a large number of variables are being plotted). ON means that both X and Y axis labels are printed. XON only plots the x axis labels and YON only plots the y axis labels.

    BOX is a special option that creates an extra column on the left and an extra row on the bottom. The axis label is printed in this box.

    The default is ON (both x and y axis labels are printed).

Note:
    The following option controls where the x axis tic marks, tic mark labels, and axis label are printed.

      SET SCATTER PLOT MATRIX X AXIS <BOTTOM/TOP/ALTERNATE>

    BOTTOM specifies that the x axis labels are printed on the bottom axis (on the last row only). TOP specifies that the x axis labels are printed on the top axis (first row only). ALTERNATE specifies that the x axis labels alternate between the top (first row) and bottom axis (last row). We recommend using the TIC OFFSET command to avoid overlap of axis labels and tic marks.

    The default is ALTERNATE.

Note:
    The following option controls where the y axis tic marks, tic mark labels, and axis label are printed.

      SET SCATTER PLOT MATRIX Y AXIS <LEFT/RIGHT/ALTERNATE>

    LEFT specifies that the y axis labels are printed on the left axis (on the first column only). RIGHT specifies that the y axis labels are printed on the right axis (last column only). ALTERNATE specifies that the y axis labels alternate between the left (first column) and right axis (last column). We recommend using the TIC OFFSET command to avoid overlap of axis labels and tic marks.

    The default is ALTERNATE.

Note:
    Users have different preferences in terms of whether the plot frames for neighboring plots are connected or not. This is controlled with the following option.

      SET SCATTER PLOT MATRIX FRAME <DEFAULT/CONNECTED/USER>

    DEFAULT connects neighboring frames (i.e., the FRAME CORNER COORDINATES are set to 0 0 100 100). USER uses whatever frame coordinates are currently set (15 20 85 90 by default) and makes no special provisions for axis labels and tic marks (i.e., you set them as you normally would, each plot uses whatever you have set). CONNECTED uses whatever frame coordinates have been set by the user, but it draws the axis labels and tic marks as if DEFAULT were being used (that is, as determined by the SET SCATTER PLOT MATRIX commands described above). Typically, CONNECTED is used to put a small bit of space between plots. For example, you might use FRAME CORNER COORDINATES 3 3 97 97 before the SCATTER PLOT MATRIX command.

    The default is DEFAULT.

Note:
    When the tic marks and tic mark labels are all plotted on the same side (i.e., SET SCATTER PLOT MATRIX Y AXIS is set to LEFT or RIGHT or SET SCATTER PLOT MATRIX X AXIS is set to BOTTOM or TOP), then overlap between plots is possible. The TIC OFFSET command can be used to avoid this. In addition, you can stagger the tic labels with the following command:

      SET SCATTER PLOT MATRIX LABEL DISPLACEMENT <NORMAL/STAGGERED/VALUE>

    NORMAL means that all tic labels are plotted at a distance determined by the TIC LABEL DISPLACEMENT command. STAGGERED means that alternating plots will be staggered. That is, one will use the standard displacement while the next uses a staggered value. Entering this command with a numeric value specifies the amount of the displacement for the staggered tic labels. For example,

      TIC MARK LABEL DISPLACEMENT 10
      SET SCATTER PLOT MATRIX LABEL DISPLACEMENT STAGGERED
      SET SCATTER PLOT MATRIX LABEL DISPLACEMENT 25

    These commands specify that the default tic label displacement is 10 and the staggered tic mark label displacement is 25.

Note:
    It is often helpful on scatter plot matrices to overlay a fitted line on the plots. The following command is used to specify the type of fit.

      SET SCATTER PLOT MATRIX FIT <NONE/LOWESS/LINE/QUAD/SMOOTH>

    NONE means that no fitted line is plotted. LOWESS means that a locally weighted least squares line will be overlaid. LINE means that a linear fit (Y = A0 + A1*X) will be overlaid. QUAD means that a quadratic fit (Y = A0 + A1*X + A2*X**2) will be overlaid. SMOOTH means that a least squares smoothing will be overlaid.

    For LOWESS, it is recommended that the lowess fraction be set fairly high (e.g., LOWESS FRACTION 0.6).

    The fitted line is currently only generated if the scatter plot matrix plot type is PLOT.

    The default is for no fitted line to be overlaid on the plot. If a overlaid fit is desired, the most common choice is to use LOWESS.

Note:
    Dataplot supports a special plot type

      PLOT Y X TAG

    In this form of the plot command, TAG is a group identifier variable. Points belonging to the same group are plotted with the same attributes (controlled by the CHARACTER and LINE commands and their various attribute setting commands).

    Using a tag variable has two common purposes:

    1. If your data has natural groups (e.g., batch 1 and batch 2).
    2. To identify certain points. The most common application would be to flag outliers.
    You can specify that the scatter plot matrix use the form of the PLOT command by using the command

      SET SCATTER PLOT MATRIX TAG <ON/OFF>

    OFF specifies that the standard plot command (PLOT Y1 Y2) will be used. ON specifies that the last variable on the SCATTER PLOT MATRIX command is a tag variable. That is, it is not plotted directly, but is instead the third variable on all the plot commands generated by the scatter plot matrix.

    Currently, this command only applies if the scatter plot matrix plot type is set to PLOT.

    This form is common enough that the command (see Syntax 2)

      YOUDEN MATRIX PLOT Y1 Y2 ... YK TAG

    implements this automatically. That is, YOUDEN MATRIX PLOT is equivalent to

      SET SCATTER PLOT MATRIX TAG ON
      SCATTER PLOT MATRIX Y1 Y2 ... YK TAG

    In some cases, you may want to use a tag variable for both purposes. That is, you may have natural groups in your data, but you also want to flag certain outlying points. You can do this by using a SUBSET clauuse. For example,

      LIMITS 0 120
      SET SCATTER PLOT MATRIX TAG ON
      CHARACTER CIRCLE SQUARE TRIANGLE
      CHARACTER FILL OFF OFF OFF
      SCATTER PLOT MATRIX Y1 Y2 Y3 Y4 TAG SUBSET Y2 <= 100
      PRE-ERASE OFF
      CHARACTER FILL ON ON ON
      SCATTER PLOT MATRIX Y1 Y2 Y3 Y4 TAG SUBSET Y2 > 100

    The SET SCATTER PLOT MATRIX LIMITS command, discussed below, can be used to control the axis limits for the individual plots.

    The default is OFF.

Note:
    Dataplot allows you to set axis limits with the LIMITS command. For the scatter plot matrix, it is often desirable to set the axis limits for each plot. This can be done with the command

      SET SCATTER PLOT MATRIX LIMITS <LOW1> <UPP1> <LOW2> <UPP2> ...

    Note that the pairs of limits correspond to the variable list in the SCATTER PLOT MATRIX command. That is, if Y3 is the third variable in the command, Dataplot will set the YLIMITS when Y3 is plotted on the y axis and the XLIMITS when Y3 is plotted on the x axis.

    This command is particularly useful if you want to overlay scatter plot matrices (the example discussed for the SET SCATTER PLOT MATRIX TAG command gives an example of where you might want to do this).

    The default is to allow the axis limits to float with the data.

Note:
    Dataplot supports a subregion capability. This is used to draw "engineering limits" on a plot. For a scatter plot matrix, if you specify engineering limits, you typically want these limits to vary with each plot. They can be specified with the command

      SET SCATTER PLOT MATRIX SUBREGION LIMITS <LOW1> <UPP1> <LOW2> <UPP2> ...

    This command is similar to the SET SCATTER PLOT MATRIX LIMITS command in that the list corresponds to the variables entered on the SCATTER PLOT MATRIX command.

    Only one set of subregion limits can be set for each variable.

    The default is that no subregion limits are set.

Note:
    For a scatter plot matrix, there are numerous choices for what to plot on the diagonal. This is controlled with the command

      SET SCATTER PLOT MATRIX DIAGONAL <BLANK/LINE/HISTOGRAM/BOXPLOT>

    If BLANK, an empty plot is generated and the variable label is plotted in the center of the empty plot. If LINE, a PLOT Y1 Y1 is generated (this will simply be a 45 degree line, but it does give some indication of the univariate distribution of the variable). If HISTOGRAM, a relative histogram of the variable is generated. For the HISTOGRAM, the axis labels do not apply to the histogram plot. A relative histogram is drawn to make comparisons more meaningful. If BOXPLOT, a box plot of the variable is generated. The BOXPLOT only applies if the SET SCATTER PLOT MATRIX TAG ON command is entered. That is, the box plot is only used if there are groups in the data. For the box plot, the y axis limtis are valid, but the x axis limits are not.

    This command only applies if the scatter plot matrix plot type is PLOT, CROSS TABULATE, or DEX CONTOUR.

    The default is BLANK.

Note:
    The scatter plot matrix is redundant in the sense that PLOT Y1 Y2 is equivalent to PLOT Y2 Y1 (with the axes reversed). For this reason, some analysts prefer to omit plots below the diagonal. This can be controlled with the command

      SET SCATTER PLOT MATRIX LOWER DIAGONAL <ON/OFF>

    If OFF, the plots below the diagonal are omitted. If ON, the plots below the diagonal are drawn.

    The default is ON.

Note:
    You can specify a special X2LABEL for the plots with the following command

      SET SCATTER PLOT MATRIX X2LABEL <OFF/ CORRELATION/PERCENT CORRELATION/EFFECT/ PERCENT ACCEPT/NUMBER ACCEPT/ACCEPT TOTAL>

    where

      OFF no special X2LABEL is drawn.
      CORRELATION the correlation of the points on the plot is printed with the X2LABEL. This option is typically used with the plot type PLOT.
      PERCENT CORRELATION this is the same as CORRELATION, except that the correlation is printed as a percent.
      EFFECT the difference between the low and high value is printed. This option is typically used with the plot type DEX INTERACTION (and doesn't really make any sense with the other plot types).
      PERCENT ACCEPT this option prints the percentage of points inside the first subregion. If no subregions are defined, this option makes no sense. It is typically used to specify the percentage of points within engineering limits.
      NUMBER ACCEPT this option is similar to PERCENT ACCEPT. However, the number of points rather than the percentage is printed.
      ACCEPT TOTAL this option is similar to NUMBER ACCEPT. However, it prints the number accepted first, then the total number of points.
      ACCEPT TOTAL PERCENT this option is similar to ACCEPT TOTAL. However, after printing the number accepted and the total number, it prints the percentage accepted.

    The following commands can be used to add a prefix and suffix to the X2LABEL. For example, you might want the PERCENT CORRELATION to append a "%" after the percent correlation and to start with "CORR = ".

      SET X2LABEL PREFIX SET X2LABEL SUFFIX

    The appearance and location of the X2LABEL are controlled with the standard X2LABEL attribute setting commands.

    There are occassions where you may want to use the values computed in the X2LABEL for additional numeric computations. These values are automatically written to the file "dpst5f.dat". The values are printed in the order the plots are generated.

Note:
    You can use standard plot control commands to control the appearance of the scatter plot matrix.

    For example,

      MULTIPLOT CORNER COORDINATES 5 5 95 95
      MULTIPLOT SCALE FACTOR 3
      TIC OFFSET UNITS SCREEN
      TIC OFFSET 5 5

    is a fairly typical set of commands commonly used with scatter plot matrices.

Default:
    None
Synonyms:
    MATRIX PLOT is a synonym for SCATTER PLOT MATRIX.
    SET MATRIX PLOT is a synonym for SET SCATTER PLOT MATRIX.
Related Commands:
    PLOT = Generates a data or function plot.
    FACTOR PLOT = Generate a factor plot.
    CONDITIONAL PLOT = Generate a conditional (subset) plot.
References
    "Visualizing Data", Cleveland, William S., Hobart Press, 1993.

    "Graphical Exploratory Data Analysis", du Toit, Steyn, and Stumpf, Springer-Verlang, 1986.

Applications:
    Exploratory Data Analysis, Multivariate Data Analysis
Implementation Date:
    2000/1
Program:
    . A basic example of a scatter plot matrix
    skip 25
    read iris.dat y1 y2 y3 y4 tag
    multiplot corner coordinates 10 10 90 90
    multiplot scale factor 2
    tic offset units screen
    tic offset 5 5
    line blank blank blank
    character 1 2 3
    set matrix plot tag on
    matrix plot y1 y2 y3 y4 tag
    move 50 95
    justification center
    text Fisher Iris Data
        
    plot generated by sample program

Date created: 6/5/2001
Last updated: 4/4/2003
Please email comments on this WWW page to alan.heckert@nist.gov.