 Dataplot Vol 1 Vol 2

# CLASSIFICATION SCATTER PLOT

Name:
CLASSIFICATION SCATTER PLOT
Type:
Graphics Command
Purpose:
Generates a classification scatter plot.
Description:
This plot is a variant of the dex scatter plot. For the dex scatter plot, the first variable is a response variable and the remaining variables are factor variables. The factor variables are typically qualitative variables (this plot is most typically used in the context of 2-level designed experiments, but it can be used for the case when there are more than two levels for some factors). For the dex scatter plot, a separate subplot is drawn for each factor with the subplot for factor k centered horizontally at x=k. Each subplot has a given horizontal width (defined by the DEX WIDTH command, defaults to 0.5). For example, the subplot for factor 2 ranges from 1.8 to 2.2 on the horizontal axis. The levels of the factor are assigned an x coordinate within this range (from lowest to highest). Then within each subplot:

 Vertical axis = value of the response variable; Horizontal axis = value of the level of a given factor.

The classification scatter plot reverses the role of the reponse variable and the factor variables. For the classification scatter plot, the Y axis variable is assumed to be qualitative (i.e., a specific number of levels) and the factor variables are assumed to be continuous (the plot will still work if some of the factor variables are also qualitative). The context is the common classification problem where we use the values of the factor variables to classify which group an observation belongs to.

For this plot, the subplots are based on the distinct levels of the response variable. For example, suppose the Y axis variable (Y) has two possible values. Then for the first factor variable (X1), we plot the values of X1 corresponding to Y = 1 with x-coordinate 0.8 and the we plot the values of X1 corresponding to Y = 2 with x-coordinate 1.2. A similar subplot is created for each factor variable.

This plot can be useful in determing what are the most important factors in determining a classification.

Syntax:
CLASSIFICATION SCATTER PLOT <y> <x1> ... <xk>
<SUBSET/EXCEPT/FOR qualification>
where <y> is the response variable (qualitative); and where the <SUBSET/EXCEPT/FOR qualification> is optional.
Examples:
CLASSIFICATION SCATTER PLOT Y X1 X2
CLASSIFICATION SCATTER PLOT Y X1 X2 X3
CLASSIFICATION SCATTER PLOT Y X1 X2 X3 X4
CLASSIFICATION SCATTER PLOT Y X1 TO X4
Note:
The TO syntax is allowed for the list of factor variables (see the EXAMPLES above).
Note:
The CHARACTER and LINE settings can be used to control the appearance of the plot. If there are m levels for the response variable and k factor variables, the first m*k traces define the values corresponding to each response/factor variable combination (i.e., each column of the plot is assigned a different trace where the numbering is from left to right on the plot). This can be useful if would like to color code the levels of the response variable or give them some other identifying value. The m*k + 1 trace draws a reference line. In this case, the reference line is the mean y value for the points on the plot. This is demonstrated in the Program example below.

For each trace, the mean, standard deviation, minimum, and maximum value for that trace is written to the file dpst4f.dat. This can be useful for annotating the plot

Default:
None
Synonyms:
None
Related Commands:
 CLASSIFICATION STATISTIC PLOT = Generate a classification plot for a statistic. DEX SCATTER PLOT = Generates a dex scatter plot. DEX ... PLOT = Generates a dex plot for a statistic. DEX WIDTH = Specifies the width of levels in a dex plot. LINES = Sets the type for plot lines. CHARACTER = Sets the type for plot characters
Applications:
Classification
Implementation Date:
2019/03
Program:
```
. Step 1:   Read the data
.
SET WRITE DECIMALS 3
DIMENSION 40 COLUMNS
SKIP 25
READ IRIS.DAT X1 TO X4 Y
SKIP 0
.
LET NFACT = 4
LET STRING T1 = Sepal Length
LET STRING T2 = Sepal Width
LET STRING T3 = Petal Length
LET STRING T4 = Petal Width
.
LOOP FOR K = 1 1 NFACT
LET MEAN^K = MEAN X^K; LET MEAN^K = ROUND(MEAN^K,3)
LET SD^K = SD X^K; LET SD^K = ROUND(SD^K,3)
END OF LOOP
.
. Step 2:   Set plot control features
.
CASE ASIS
TITLE CASE ASIS
LABEL CASE ASIS
TIC MARK LABEL CASE ASIS
TITLE OFFSET 2
.
CHARACTERS 1 2 3 1 2 3 1 2 3 1 2 3 BLANK
CHARACTER COLOR BLUE RED GREEN BLUE RED GREEN BLUE RED GREEN BLUE RED GREEN
LINES COLOR BLUE RED GREEN BLUE RED GREEN BLUE RED GREEN BLUE RED GREEN
LET PLOT LINE 13 = BLANK
XLIMITS 1 NFACT
MAJOR XTIC MARK NUMBER NFACT
MINOR XTIC MARK NUMBER 0
TIC MARK OFFSET UNITS DATA
XTIC OFFSET 1 1
XTIC LABEL FORMAT ALPHA
XTIC LABEL CONTENT F1:sp()Sepalcr()Length F2:sp()Sepalcr()Width ...
F3:sp()Petalcr()Length F4:sp()Petalcr()Width
Y1LABEL Standardized Feature
X1LABEL Features
X1LABEL DISPLACEMENT 12
YLIMITS -4 4
.
LET X1 = STANDARDIZE X1
LET X2 = STANDARDIZE X2
LET X3 = STANDARDIZE X3
LET X4 = STANDARDIZE X4
.
. Step 3:   Generate plots
.
TITLE Classification Scatter Plot: Standardized Units
CLASSIFICATION SCATTER PLOT Y X1 X2 X3 X4
.
```
```TITLE IRIS Classification Analysis Based on Standardized Data
.
CLASSIFICATION SCATTER PLOT Y X1 X2 X3 X4
.
LET XCOOR1 = 86
LET XCOOR2 = 88
LET YCOOR  = 89
LET YINC   = 2.5
JUSTIFICATION LEFT
COLOR BLACK
HEIGHT 2
.
LOOP FOR K = 1 1 NFACT
MOVE XCOOR1 YCOOR
TEXT F^K: ^T^K
LET YCOOR = YCOOR - YINC
MOVE XCOOR2 YCOOR
TEXT Mean = ^MEAN^K
LET YCOOR = YCOOR - YINC
MOVE XCOOR2 YCOOR
TEXT SD   = ^SD^K
LET YCOOR = YCOOR - YINC
END OF LOOP
.
COLOR BLUE
MOVE XCOOR1 45
TEXT Cat1: Setosa
.
COLOR RED
MOVE XCOOR1 42.5
TEXT Cat2: Versicolor
.
COLOR GREEN
MOVE XCOOR1 40
TEXT Cat3: Virginica
.
skip 1
read dpst4f.dat ymean ysd ymin ymax
skip 0
let ymean = round(ymean,1)
let ysd   = round(ysd,1)
let ymin  = round(ymin,1)
let ymax  = round(ymax,1)
.
character blank all
character size 1.5 all
character just right all
character color blue blue blue blue red red red red green green green green
let nlen = 1
let sblank = blank string nlen
.
set substitute format f4.1
loop for l = 1 1 nfact
loop for k = 1 1 3
let k2 = (l-1)*3 + k
let aval = ymax(k2)
let bval = ymean(k2)
let cval = ymin(k2)
let dval = ysd(k2)
let string s2 = ^aval ^bval ^cval ^dval
if k = 1
let string s = ^s2
else
let s = string concatenate s sblank s2
end of if
end of loop
character ^s
let xpos = sequence 0.8 4 0.2 1.21
let xpos = (l - 1) + xpos
let ypos = sequence 26 -1.5 21 for i = 1 1 12
let tag = sequence 1 1 12
drawds symbol xpos ypos tag
delete s s2
end of loop
.
color black
height 1.5
justification left
move xcoor1 25.5
text Max
move xcoor1 24
text Mean
move xcoor1 22.5
text Min
move xcoor1 21
text SD
.
height 2
just left
move 2 7.5
text if F3 <= -0.7, then cat = 1
move 2 5
text if F4 >=  0.4, then cat = 3
move 2 2.5
text else                cat = 2
.
line color black
line dotted
drawsdsd 15 0.4 85 0.4
drawsdsd 15 -0.7 85 -0.7
```

NIST is an agency of the U.S. Commerce Department.

Date created: 03/14/2019
Last updated: 03/14/2019