SED navigation bar go to SED home page go to Dataplot home page go to NIST home page SED Home Page SED Staff SED Projects SED Products and Publications Search SED Pages
Dataplot Vol 1 Vol 2

PERCENT POINT PLOT

Name:
    PERCENT POINT PLOT
Type:
    Graphics Command
Purpose:
    Generates a percent point plot.
Description:
    A percent point plot is a graphical data analysis technique for summarizing the distributional information of a variable. It consists of:

      Vertical axis = percent point;
      Horizontal axis = percent (0 to 100).

    Thus, for example, if the value of 50 is chosen on the horizontal axis, then the corresponding value on the vertical axis is the estimated 50% point (that is, the median) from the data.

    The percent point plot can be generated for either raw data or for binned data.

    For raw data, the percentile plot is constructed by plotting the sorted data on the vertical axis. The corresponding horizontal axis value for the i-th point is 100*Yi/N with Yi and N denoting the i-th observation of the sorted data and the sample size, respectively. The multiplication by 100 is to covert the horizontal axis to a percentage value.

    For binned data, the vertical axis value is the mid-point of the bin. The corresponding horizontal axis values are the cumulative sums of the frequencies of the bins divided by the sum of the frequencies for all bins. This value is multiplied by 100 to convert the horizontal axis to a percentage value.

    By default, raw data is first binned into frequency data. To suppress this binning (i.e., generate the raw data version of the plot), enter the command

      SET PERCENT POINT PLOT UNBINNED

    To restore the default of binning raw data, enter

      SET PERCENT POINT PLOT BINNED

    Typically no binning is preferred for small to moderate size data sets. Binning can be helpful for large data sets in that it reduces the number of points that are plotted.

Syntax 1:
    PERCENT POINT PLOT <x>             <SUBSET/EXCEPT/FOR qualification>
    where <x> is the variable of raw data;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    This syntax is used for the case where you have raw data.

Syntax 2:
    PERCENT POINT PLOT <y> <x> <SUBSET/EXCEPT/FOR qualification>
    where <y> is the variable of pre-computed frequencies;
                <x> is the variable of distinct values;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    This syntax is used for the case where you have pre-computed frequencies at each data level. This syntax is used when you have equal width bins.

Syntax 3:
    PERCENT POINT PLOT <y> <xlow> <xhigh>
                                              <SUBSET/EXCEPT/FOR qualification>
    where <y> is the variable of pre-computed frequencies;
                <xlow> is the variable containing the lower limits of the bins;
                <xhigh> is the variable containing the upper limits of the bins;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    This syntax is used for the case where you have pre-computed frequencies at each data level. This syntax is used when you have unequal width bins.

Syntax 4:
    MULTIPLE PERCENT POINT PLOT <y1> ... <yk>
                            <SUBSET/EXCEPT/FOR qualification>
    where <y1> ... <yk> is a list of 1 to 30 response variables;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    This syntax will generate percent point plots of each of the listed response variables on the same plot. You can specify different plot attributes for each response variable.

    This syntax is only supported for raw data (i.e., no binned data).

Syntax 5:
    REPLICATED PERCENT POINT PLOT <y> <x1> ... <xk>
                            <SUBSET/EXCEPT/FOR qualification>
    where <y> is the response variable;
                <x1> ... <xk> is a list of 1 to 6 group-id variables;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    From one to six group-id variables can be specified (most commonly there is a single group-id variable).

    Note that with this syntax, the plot points corresponding to each group are drawn with different attributes (i.e., the first group uses the first setting for the CHARACTER and LINE and related attribute setting commands, the second group uses the second setting, and so on). For example, this syntax can be used to label the plot points with the group-id.

    If there is more than one group-id variable, the attribute settings work from right to left. That is, if X1 has 2 levels and X2 has 2 levels, then

      trace 1 = Level 1 of X1 and Level 1 of X2
      trace 2 = Level 1 of X1 and Level 2 of X2
      trace 3 = Level 2 of X1 and Level 1 of X2
      trace 4 = Level 2 of X1 and Level 1 of X2
Syntax 6:
    HIGHLIGHTED PERCENT POINT PLOT <y> <x>
                            <SUBSET/EXCEPT/FOR qualification>
    where <y> is the response variable;
                <x> is a group-id variable;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    Although this syntax is similar to the REPLICATION case, it is generally used in a different way. The REPLICATION case is used when we have distinct groups of data and we want to generate separate percent point plots for each group. Highlighting is used when we have a single group of data, but we want to draw some of the points with different attributes. For example, we may want to emphasize the extreme points in the plot.

Examples:
    PERCENT POINT PLOT Y
    PERCENT POINT Y X
    PERCENT POINT Y XLOW XHIGH
    HIGHTLIGHTED PERCENT POINT Y TAG
    MULTIPLE PERCENT POINT Y1 Y2 Y3
    PERCENT POINT Y X SUBSET X > 2
Note:
    When raw data is binned, Dataplot divides the raw data into classes in the same manner as it does for a histogram or frequency polygon. The percent points are calculated at the mid-points of these histogram classes. The defaults are the same as for histograms (the class width is 0.3*standard deviation, 6 classes above and 6 classes below the mean). You can specify your own binning with the CLASS LOWER, CLASS UPPER, and CLASS WIDTH commands. This is demonstrated in the sample program below.

    The SET HISTOGRAM CLASS WIDTH can be used to define several other algorithms for binning the data (HELP HISTOGRAM CLASS WIDTH for details). The SET HISTOGRAM OUTLIERS command also applies to the PERCENT POINT PLOT if raw data is being binned.

Note:
    Percent point plots are also referred to as quantile plots in the statistical literature.
Note:
    The attributes of the plot can be set by the first setting of the LINE, CHARACTER, SPIKE, and BAR commands (and there corresponding attribute setting commands). This is demonstrated in the sample program below.
Default:
    None
Synonyms:
    None
Related Commands:
    QUAN-QUAN PLOT Generates a quantile-quantile plot.
    HISTOGRAM = Generates a histogram.
    PIE CHART = Generates a pie chart.
    FREQUENCY PLOT = Generate a frequency plot.
    PROBABILITY PLOT = Generate a probability plot.
    PPCC PLOT = Generates probability plot correlation coefficient plot.
    PLOT = Generate a data or function plot.
    CLASS LOWER = Set the lower class minimum for histograms, frequency plots, and pie charts.
    CLASS UPPER = Set the upper class maximum for histograms, frequency plots, and pie charts.
    CLASS WIDTH = Set the class width for histograms, frequency plots, and pie charts.
    HISTOGRAM CLASS WIDTH = Specify alternative default class wdith algorithms for histograms.
Applications:
    Distributional Analysis
Reference:
    Chambers, Cleveland, Kleiner, and Tukey (1983), "Graphical Methods for Data Analysis", Wadsworth.
Implementation Date:
    Pre-1987
    1998/09: Support for SET PERCENT POINT PLOT command.
    2011/02: Support for REPLICATION and MULTIPLE options.
    2011/02: Support for HIGHLIGHT option.
Program 1:
     
    SKIP 25
    READ SUNSPOT2.DAT Y
    .
    LET ALOW = MINIMUM Y
    LET AHIGH = MAXIMUM Y
    CLASS LOWER ALOW
    CLASS UPPER AHIGH
    CLASS WIDTH 1.0
    CHARACTER CIRCLE
    CHARACTER FILL ON
    CHARACTER SIZE 1.2
    X1LABEL PERCENT POINT
    Y1LABEL DATA VALUE
    TITLE AUTOMATIC
    .
    PERCENT POINT PLOT Y
        
    plot generated by sample program
Program 2:
     
    let y1 = norm rand numb for i = 1 1 100
    .
    title case asis
    title offset 2
    title automatic
    label case asis
    tic mark offset units screen
    tic mark offset 3 3
    .
    char circle
    char fill on
    char hw 0.5 0.375
    line blank
    .
    multiplot corner coordinates 5 5 95 95
    multiplot scale factor 2
    multiplot 2 2
    .
    set percent point plot unbinned
    set histogram outliers on
    set histogram empty bins off
    title Unbinned Data
    percent point plot y1
    .
    set percent point plot binned
    title Data Binned by Command
    percent point plot y1
    .
    title User Created Bins: Equi-Spaced Bins
    let z2 x2 = binned y1
    percent point plot z2 x2
    .
    let minsize = 5
    let z3 xlow xhigh = combine frequency table z2 x2
    title User Created Bins: Unequal-Spaced Bins
    percent point plot z3 xlow xhigh
    .
    end of multiplot
    justification center
    move 50 97
    text Percent Point Plots for 100 Normal Random Numbers
    move 50 5
    text Percentile
    direction vertical
    move 3 50
    text Response Value
        
    plot generated by sample program
Program 3:
     
    dimension 500 rows
    skip 25
    read iris.dat y1 y2 y3 y4
    let m = create matrix y1 y2 y3 y4
    .
    title case asis
    title offset 2
    label case asis
    .
    char circle all
    char color black
    char fill on all
    char hw 0.5 0.375 all
    line blank all
    .
    y1label Response Value
    x1label Percentile
    title IRIS Data (all species combined)
    .
    set percent point plot unbinned
    set histogram outliers on
    set histogram empty bins off
    percent point plot m
    .
    char color red blue cyan green
    title IRIS Data (species plotted separately)
    multiple percent point plot y1 to y4
        
    plot generated by sample program

    plot generated by sample program

Program 4:
     
    skip 25
    read gear.dat y x
    .
    title case asis
    title offset 2
    label case asis
    tic mark offset units screen
    tic mark offset 5 5
    .
    char circle all
    char color black red blue green cyan grey brown magenta dgreen orange
    char fill on all
    char hw 0.5 0.375 all
    line blank all
    .
    title Percent Point Plots for GEAR.DAT
    y1label Response Value
    x1label Percentile
    .
    set percent point plot unbinned
    set histogram outliers on
    set histogram empty bins off
    replicated percent point plot y x
        
    plot generated by sample program
Date created: 06/04/2016
Last updated: 12/04/2023

Please email comments on this WWW page to alan.heckert@nist.gov.