Dataplot Vol 1 Vol 2

# PERCENT POINT PLOT

Name:
PERCENT POINT PLOT
Type:
Graphics Command
Purpose:
Generates a percent point plot.
Description:
A percent point plot is a graphical data analysis technique for summarizing the distributional information of a variable. It consists of:

 Vertical axis = percent point; Horizontal axis = percent (0 to 100).

Thus, for example, if the value of 50 is chosen on the horizontal axis, then the corresponding value on the vertical axis is the estimated 50% point (that is, the median) from the data.

The percent point plot can be generated for either raw data or for binned data.

For raw data, the percentile plot is constructed by plotting the sorted data on the vertical axis. The corresponding horizontal axis value for the i-th point is 100*Yi/N with Yi and N denoting the i-th observation of the sorted data and the sample size, respectively. The multiplication by 100 is to covert the horizontal axis to a percentage value.

For binned data, the vertical axis value is the mid-point of the bin. The corresponding horizontal axis values are the cumulative sums of the frequencies of the bins divided by the sum of the frequencies for all bins. This value is multiplied by 100 to convert the horizontal axis to a percentage value.

By default, raw data is first binned into frequency data. To suppress this binning (i.e., generate the raw data version of the plot), enter the command

SET PERCENT POINT PLOT UNBINNED

To restore the default of binning raw data, enter

SET PERCENT POINT PLOT BINNED

Typically no binning is preferred for small to moderate size data sets. Binning can be helpful for large data sets in that it reduces the number of points that are plotted.

Syntax 1:
PERCENT POINT PLOT <x>             <SUBSET/EXCEPT/FOR qualification>
where <x> is the variable of raw data;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

This syntax is used for the case where you have raw data.

Syntax 2:
PERCENT POINT PLOT <y> <x> <SUBSET/EXCEPT/FOR qualification>
where <y> is the variable of pre-computed frequencies;
<x> is the variable of distinct values;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

This syntax is used for the case where you have pre-computed frequencies at each data level. This syntax is used when you have equal width bins.

Syntax 3:
PERCENT POINT PLOT <y> <xlow> <xhigh>
<SUBSET/EXCEPT/FOR qualification>
where <y> is the variable of pre-computed frequencies;
<xlow> is the variable containing the lower limits of the bins;
<xhigh> is the variable containing the upper limits of the bins;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

This syntax is used for the case where you have pre-computed frequencies at each data level. This syntax is used when you have unequal width bins.

Syntax 4:
MULTIPLE PERCENT POINT PLOT <y1> ... <yk>
<SUBSET/EXCEPT/FOR qualification>
where <y1> ... <yk> is a list of 1 to 30 response variables;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

This syntax will generate percent point plots of each of the listed response variables on the same plot. You can specify different plot attributes for each response variable.

This syntax is only supported for raw data (i.e., no binned data).

Syntax 5:
REPLICATED PERCENT POINT PLOT <y> <x1> ... <xk>
<SUBSET/EXCEPT/FOR qualification>
where <y> is the response variable;
<x1> ... <xk> is a list of 1 to 6 group-id variables;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

From one to six group-id variables can be specified (most commonly there is a single group-id variable).

Note that with this syntax, the plot points corresponding to each group are drawn with different attributes (i.e., the first group uses the first setting for the CHARACTER and LINE and related attribute setting commands, the second group uses the second setting, and so on). For example, this syntax can be used to label the plot points with the group-id.

If there is more than one group-id variable, the attribute settings work from right to left. That is, if X1 has 2 levels and X2 has 2 levels, then

 trace 1 = Level 1 of X1 and Level 1 of X2 trace 2 = Level 1 of X1 and Level 2 of X2 trace 3 = Level 2 of X1 and Level 1 of X2 trace 4 = Level 2 of X1 and Level 1 of X2
Syntax 6:
HIGHLIGHTED PERCENT POINT PLOT <y> <x>
<SUBSET/EXCEPT/FOR qualification>
where <y> is the response variable;
<x> is a group-id variable;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

Although this syntax is similar to the REPLICATION case, it is generally used in a different way. The REPLICATION case is used when we have distinct groups of data and we want to generate separate percent point plots for each group. Highlighting is used when we have a single group of data, but we want to draw some of the points with different attributes. For example, we may want to emphasize the extreme points in the plot.

Examples:
PERCENT POINT PLOT Y
PERCENT POINT Y X
PERCENT POINT Y XLOW XHIGH
HIGHTLIGHTED PERCENT POINT Y TAG
MULTIPLE PERCENT POINT Y1 Y2 Y3
PERCENT POINT Y X SUBSET X > 2
Note:
When raw data is binned, Dataplot divides the raw data into classes in the same manner as it does for a histogram or frequency polygon. The percent points are calculated at the mid-points of these histogram classes. The defaults are the same as for histograms (the class width is 0.3*standard deviation, 6 classes above and 6 classes below the mean). You can specify your own binning with the CLASS LOWER, CLASS UPPER, and CLASS WIDTH commands. This is demonstrated in the sample program below.

The SET HISTOGRAM CLASS WIDTH can be used to define several other algorithms for binning the data (HELP HISTOGRAM CLASS WIDTH for details). The SET HISTOGRAM OUTLIERS command also applies to the PERCENT POINT PLOT if raw data is being binned.

Note:
Percent point plots are also referred to as quantile plots in the statistical literature.
Note:
The attributes of the plot can be set by the first setting of the LINE, CHARACTER, SPIKE, and BAR commands (and there corresponding attribute setting commands). This is demonstrated in the sample program below.
Default:
None
Synonyms:
None
Related Commands:
 QUAN-QUAN PLOT Generates a quantile-quantile plot. HISTOGRAM = Generates a histogram. PIE CHART = Generates a pie chart. FREQUENCY PLOT = Generate a frequency plot. PROBABILITY PLOT = Generate a probability plot. PPCC PLOT = Generates probability plot correlation coefficient plot. PLOT = Generate a data or function plot. CLASS LOWER = Set the lower class minimum for histograms, frequency plots, and pie charts. CLASS UPPER = Set the upper class maximum for histograms, frequency plots, and pie charts. CLASS WIDTH = Set the class width for histograms, frequency plots, and pie charts. HISTOGRAM CLASS WIDTH = Specify alternative default class wdith algorithms for histograms.
Applications:
Distributional Analysis
Reference:
Chambers, Cleveland, Kleiner, and Tukey (1983), "Graphical Methods for Data Analysis", Wadsworth.
Implementation Date:
Pre-1987
1998/09: Support for SET PERCENT POINT PLOT command.
2011/02: Support for REPLICATION and MULTIPLE options.
2011/02: Support for HIGHLIGHT option.
Program 1:

SKIP 25
.
LET ALOW = MINIMUM Y
LET AHIGH = MAXIMUM Y
CLASS LOWER ALOW
CLASS UPPER AHIGH
CLASS WIDTH 1.0
CHARACTER CIRCLE
CHARACTER FILL ON
CHARACTER SIZE 1.2
X1LABEL PERCENT POINT
Y1LABEL DATA VALUE
TITLE AUTOMATIC
.
PERCENT POINT PLOT Y

Program 2:

let y1 = norm rand numb for i = 1 1 100
.
title case asis
title offset 2
title automatic
label case asis
tic mark offset units screen
tic mark offset 3 3
.
char circle
char fill on
char hw 0.5 0.375
line blank
.
multiplot corner coordinates 5 5 95 95
multiplot scale factor 2
multiplot 2 2
.
set percent point plot unbinned
set histogram outliers on
set histogram empty bins off
title Unbinned Data
percent point plot y1
.
set percent point plot binned
title Data Binned by Command
percent point plot y1
.
title User Created Bins: Equi-Spaced Bins
let z2 x2 = binned y1
percent point plot z2 x2
.
let minsize = 5
let z3 xlow xhigh = combine frequency table z2 x2
title User Created Bins: Unequal-Spaced Bins
percent point plot z3 xlow xhigh
.
end of multiplot
justification center
move 50 97
text Percent Point Plots for 100 Normal Random Numbers
move 50 5
text Percentile
direction vertical
move 3 50
text Response Value

Program 3:

dimension 500 rows
skip 25
read iris.dat y1 y2 y3 y4
let m = create matrix y1 y2 y3 y4
.
title case asis
title offset 2
label case asis
.
char circle all
char color black
char fill on all
char hw 0.5 0.375 all
line blank all
.
y1label Response Value
x1label Percentile
title IRIS Data (all species combined)
.
set percent point plot unbinned
set histogram outliers on
set histogram empty bins off
percent point plot m
.
char color red blue cyan green
title IRIS Data (species plotted separately)
multiple percent point plot y1 to y4


Program 4:

skip 25
.
title case asis
title offset 2
label case asis
tic mark offset units screen
tic mark offset 5 5
.
char circle all
char color black red blue green cyan grey brown magenta dgreen orange
char fill on all
char hw 0.5 0.375 all
line blank all
.
title Percent Point Plots for GEAR.DAT
y1label Response Value
x1label Percentile
.
set percent point plot unbinned
set histogram outliers on
set histogram empty bins off
replicated percent point plot y x

Date created: 06/04/2016
Last updated: 12/04/2023