SED navigation bar go to SED home page go to Dataplot home page go to NIST home page SED Home Page SED Staff SED Projects SED Products and Publications Search SED Pages
Dataplot Vol 1 Vol 2

PPCC PLOT

Name:
    ... PPCC PLOT
Type:
    Graphics Command
Purpose:
    Generates a probability plot correlation coefficient (PPCC) plot.
Description:
    A PPCC plot is a graphical data analysis technique for determining that member of the specified distributional family which provides a "best" distributional fit to the data.

    The PPCC plot is based on the following two ideas:

    1. The "straightness" of the probability plot is a good measure of distributional fit. That is, the "best" distributional fit is the one with the most linear probability plot.

    2. The correlation coefficient of the points on the probabability plot is a good measure of the "straightness" (i.e., linearity) of the probability plot.

    The PPCC plot is formed by selecting a value of the shape parameter, generating the probability plot (this probability plot is not actually graphed), and then computing the correlation coefficient of the resulting probability plot. The PPCC plot then consists of:

    Vertical axis = probability plot correlation coefficient value for the given value of the shape parameter;
    Horizontal axis = distributional family parameter value (i.e., the value of the shape parameter.

    The value of the distributional parameter (on the horizontal axis) which corresponds to the maximum of the PPCC plot curve (on the vertical axis) is, of course, of interest since it indicates the best-fit member of the family.

    Some advantages of the PPCC plot as a fitting technique are:

    1. The PPCC plot is invariant with respect to location and scale. Once we determine the optimal value of the shape parameter from the PPCC plot, we can generate the corresponding probability plot. The intercept and slope of line fit to the probability provide valid estimates of location and scale (the Dataplot probability plot is designed in such a way that this is true).

    2. The probability plot, and thus the PPCC plot, only depends on the percent point function. That is, if we know how to compute the percent point function, we can use the PPCC plot/probability plot to estimate the parameters of the distribution.

    3. The PPCC plot can show the sensitivity of the shape parameter. That is, it can show what neighborhood of the parameter estimate is likely to produce a reasonably straight probability plot.

    4. The PPCC plot can be applied to binned data.

    5. The PPCC plot can be applied to censored data.

      A censored PPCC plot is generated by finding the value of the shape parameter that results in the maximum correlation coefficient of the censored probability plot. For details on how the censored probablity plot is generated, enter the command

        HELP PROBABILITY PLOT

      The censoring variable should contain a 1 to indicate a failure time and a 0 to indicate a censoring time. The censored PPCC plot is not suppported for binned data.

    Some disadvantages of the PPCC plot as a fitting technique are:

    1. If the percent point function is expensive to compute (e.g., if it involves the numerical inversion of a rather complicated cumulative distribution function), the ppcc plot can be slow to generate. These types of percent point functions may also have convergence problems.

      In these cases, the SET PPCC PLOT DATA POINTS may be helpful in reducing the computational burden. See the Note section below.

    2. The PPCC plot does not produce interval estimates for the parameters.

      The bootstrap provides a method for generating these interval estimates. For details, enter

    3. Heavy-tailed distributions may have very high variability in the extremes of the data. This can sometimes lead to poor discrimination in the plot.

      In these cases, the KS PLOT provides an alternative measure of goodness of fit that may perform better.

    4. If a shape parameter behaves much like a scale or location parameter, the PPCC plot may not discriminate well.

      The KS PLOT has the option of fixing the values of the location and scale parameters. This can sometimes be useful in these cases.

    5. The PPCC plot does not extend well to more than one shape parameter.

      Dataplot has extended the PPCC plot to distributions with two shape parameters. Note that for the two shape parameter case, you may want to investigate the KS PLOT. This uses the value of the Kolmogorov-Smirnov goodness of fit statistic as the measure of distributional fit. The KS PLOT seems to work better for at least some distributions with two shape parameters.

      Dataplot supports two formats for the PPCC plot with two shape parameters:

      1. As in the one shape parameter case, the Y axis will contain the value of the correlation coefficient. The X axis will contain the value of the second shape parameter. Each value of the first shape parameter will be represented by a separate trace (i.e., curve) on the plot.

        To change the order of the shape parameters in the above format, enter the command

          SET PPCC PLOT AXIS ORDER REVERSE

        To restore the default order, enter the command

          SET PPCC PLOT AXIS ORDER DEFAULT

      2. Alternatively, you can generate a 3D wireframe plot.

        You can specify which format to use with the command

          SET PPCC FORMAT <TRACE/3D>

    The PPCC plot now supports two different types of grouping.

    1. Some data sets are collected in binned format. That is, the values for the data are split into intervals and the number of occurences of the data within each interval are are counted.

      Dataplot supports either equal sized bins (the bin variable contains the mid-point of the bin) or unequal size bins (two bin variables are specified: one contains the lower limit for the bins and the other contains the upper limits for the bins).

    2. The ppcc plot also supports the case where there are multiple batches of data. In this case, a separate ppcc curve is drawn for each batch of data (for unbinned data a curve will also be drawn for the full data set). We refer to this as the "replication" case below. Replication can be used for either the raw data case or the binned data case.

      This form is useful for the case where we want to know if different batches of data can be modeled with a common shape parameter. One example of this is accelerated testing where Weibull models should have a common shape parameter at different stress levels if a linear accelaraton model is valid.

    PPCC plots are available for the following continuous distributional families (with the distributional parameter in parentheses) with one shape parameter:

    1. Weibull (gamma)
    2. double weibull (gamma)
    3. inverted weibull (gamma)
    4. gamma (gamma)
    5. double gamma (gamma)
    6. log gamma (gamma)
    7. inverted gamma (gamma)
    8. Wald (gamma)
    9. fatigue life (gamma)
    10. Pareto (gamma)
    11. Pareto second kind (gamma)
    12. generalized Pareto (gamma)
    13. generalized half logistic (gamma)
    14. extreme value type 2 (gamma)
    15. generalized extreme value (gamma)
    16. extreme value (gamma, combines Weibull, extreme value type 2)
    17. geometric extreme exponential (gamma)
    18. Tukey lambda (lambda)
    19. skew normal (lambda)
    20. skew double exponential (lambda)
    21. t (nu)
    22. folded t (nu)
    23. chi-squared (nu)
    24. chi (nu)
    25. generalized logistic (alpha)
    26. log double exponential (alpha)
    27. error (alpha)
    28. lognormal (sd)
    29. power-normal (p)
    30. Von Mises (b)
    31. reciprocal (b)
    32. log-logistic (delta)
    33. wrapped cauchy (c)
    34. Bradford (beta)
    35. asymmetric double exponential (k)

    PPCC plots are available for the following continuous distributional families (with the distributional parameter in parentheses) with two shape parameters:

    1. inverse Gaussian (gamma, mu)
    2. reciprocal inverse gaussian (gamma, mu)
    3. generalized gamma (gamma, c)
    4. exponentiated Weibull (gamma, theta)
    5. exponential power (alpha, beta)
    6. Beta (alpha, beta)
    7. inverted beta (alpha, beta)
    8. two-sided power (theta, n)
    9. Johnson SU (alpha1, alpha2)
    10. Johnson SB (alpha1, alpha2)
    11. alpha (alpha1, alpha2)
    12. Gompertz (c, b)
    13. g and h (g, h)
    14. F (nu1, nu2)
    15. log skew normal (lambda, sd)
    16. power lognormal (nu, sd)
    17. folded normal (mu, sd)
    18. folded Cauchy (loc, scale)
    19. skew t (nu, lambda)
    20. noncentral t (nu, lambda)
    21. noncentral chi-square (nu, lambda)
    22. truncated exponential (m, sd, assume truncation point, X0, is known)

    PPCC plots are available for the following discrete distributional families (with the distributional parameter in parentheses):

    1. geometric (p)
    2. Yule (p)
    3. Poisson (lambda)
    4. logarithmic series (theta)
    5. binomial (p, assume n known)
    6. negative binomial (p, assume k known)
    7. Beta-Binomial (alpha, beta, assume n known)
    8. Hermite (alpha, beta)

    The use of the PPCC plot for discrete distributions is still experimental (see the Note below).

    The percent point function for the discrete distributions is a step function (since X is restricted to integer values). This can result in non-smooth ppcc and probability plots. For discrete distributions, the KS PLOT (which will plot the minimum value of chi-square statistic) is recommended over the PPCC PLOT as long as the sample size is reasonably large.

Syntax 1:
    <family> PPCC PLOT <y>             <SUBSET/EXCEPT/FOR/qualification>
    where <y> is the variable of raw data values under analysis;
                <family> is one of the distributions listed above;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    This syntax is used for the raw data case.

Syntax 2:
    <family> CENSORED PPCC PLOT <y> <x>
                            <SUBSET/EXCEPT/FOR/qualification>
    where <y> is the variable of raw data values under analysis;
                <x> is the censoring variabe;
                <family> is one of the distributions listed above;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    This syntax is used for the raw data case where there is censoring.

Syntax 3:
    <family> REPLICATED PPCC PLOT <y> <groupid>
                            <SUBSET/EXCEPT/FOR/qualification>
    where <y> is the variable of raw data values under analysis;
                <groupid> is a group id variable;
                <family> is one of the distributions listed above;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    This syntax is used for the raw data case where there is grouped data (in the sense of batches).

Syntax 4:
    <family> CENSORED REPLICATED PPCC PLOT <y> <x> <groupid>
                            <SUBSET/EXCEPT/FOR/qualification>
    where <y> is the variable of raw data values under analysis;
                <x> is the censoring variabe;
                <groupid> is a group id variable;
                <family> is one of the distributions listed above;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    This syntax is used for the raw data case where there is both grouped data (in the sense of batches) and censoring.

Syntax 5:
    <family> PPCC PLOT <y> <x>
                            <SUBSET/EXCEPT/FOR/qualification>
    where <y> is the variable of pre-computed frequencies;
                <x> is the variable of distinct values for the variable under analysis;
                <family> is one of the families listed above;
    and where the is optional.

    This syntax is used for the binned data case where the bins are defined by the mid-points of each bin.

Syntax 6:
    <family> PPCC PLOT <y> <xlow> <xhigh>
                            <SUBSET/EXCEPT/FOR/qualification>
    where <y> is the variable of pre-computed frequencies;
                <xlow> is the variable containing the lower limits for the bins;
                <xhigh> is the variable containing the upper limits for the bins;
                <family> is one of the families listed above;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    This syntax is used for the binned data case where the bins are defined by the lower and upper limits of the bins (i.e., the bins can be of unequal width).

Syntax 7:
    <family> REPLICATED PPCC PLOT <y> <x> <groupid>
                            <SUBSET/EXCEPT/FOR/qualification>
    where <y> is the variable of pre-computed frequencies;
                <x> is the variable of distinct values for the variable under analysis;
                <groupid> is a group id variable;
                <family> is one of the families listed above;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    This syntax is used for the binned data case where there are multiple batches of data. The bins are defined by the mid-points of each bin and there are multiple batches of data.

Syntax 8:
    <family> PPCC PLOT <y> <xlow> <xhigh> <groupid>
                            <SUBSET/EXCEPT/FOR/qualification>
    where <y> is the variable of pre-computed frequencies;
                <xlow> is the variable containing the lower limits for the bins;
                <xhigh> is the variable containing the upper limits for the bins;
                <groupid> is a group id variable;
                <family> is one of the families listed above;
    and where the is optional.

    This syntax is used for the binned data case where there are multiple batches of data. The bins are defined by the lower and upper limits of the bins (i.e., the bins can be of unequal width).

Examples:
    LAMBDA PPCC PLOT X
    T PPCC PLOT X
    EXTREME VALUE TYPE 2 PPCC PLOT X
    POISSON PPCC PLOT X
    LAMBDA PPCC PLOT F X
    T PPCC PLOT F X
    EXTREME VALUE TYPE 2 PPCC PLOT F X
    POISSON PPCC PLOT F X
Note:
    The range of parameter is determined automatically. However, if you wish to restrict the range, you can specify the lower and upper limits by appending a 1 or 2 to the parameter name and assigning a value. For example, to restrict a Weibull ppcc plot to values 0.5 and 20, do the following:

      LET GAMMA1 = 0.5
      LET GAMMA2 = 20
      WEIBULL PPCC PLOT Y

    A common use of this is to obtain a refinement of the estimate of the shape parameter. That is, an initial iteration (typically just the default values of the parameter) is used to identify the appropriate neighborhood of the optimal value of the shape parameter. Then a second iteration of the PPCC PLOT is generated with the parameter restricted to a much narrower range of values. Although this iteration can be repeated as many times as you like, for practical purposes a two iterations is typically sufficient.

Note:
    The PPCC PLOT automatically saves several parameters. The MAXPPCC parameter contains the maximum correlation that was computed and the SHAPE parameter contains the value of the estimated distributional parameter (e.g., GAMMA for the Weibull distribution) that corresponds to MAXPPCC.

    In the case of two shape parameters, these are saved as SHAPE1 and SHAPE2.

Note:
    For the truncated exponential distribution, we assume that the truncation parameter, X0, is known. To set this value, enter

      LET X0 = <value>

    before generating the ppcc plot.

    For the noncentral t and noncentral chi-square distributions, we can fix the value of the degrees of freedom parameter to a single value. In this case, the ppcc plot reverts to a one shape parameter plot. Enter the commands

      LET NU1 = <value>
      LET NU2 = <value>

    where <value> is the same for NU1 and NU2.

Note:
    The SET MINMAX command can be used to specify the minimum or maximum form for the following distributions:

    • Weibull
    • Frechet (extreme value type 2)
    • generalized extreme value

    A value of 1 or MIN specifies the minimum form of the disribution and a value of 2 or MAX specifies the maximum form of the distribution.

    Although earlier versions of Dataplot required that this parameter be explicitly entered, Dataplot will now choose a default form of the distribution if it has not been specified. For the Weibull, the minimum form is the default. For the Frechet and generalized extreme value disributions, the maximum form is the default. Note that if you enter an explicit SET MINMAX command, it applies to all 3 distributions.

Note:
    When the percent point function is expensive to compute, the PPCC plot can take a long time to generate. Two approaches to this are:

    1. You can bin the data before generating the PPCC plot.

    2. As an alternative to binning, you can use the command

        SET PPCC PLOT DATA POINTS <value>

      With this command, Dataplot will generate <value> equally spaced percentiles of the data. The PPCC plot is then generated on these percentiles.

      If the number of data points in the response variable is less than <value> then the full data set is used.

      The minimum number for <value> is 25. Numbers in the range 50 to 200 are typically used.

    One problem with binning data is that the optimal bin width is dependent on the underlying distribution (which is what we are trying to determine). Given a choice, we would recommend using the second form when possible. However, in practice many data sets are collected in binned format. In this case, the first form is the best we can do.

    For distributions that have percent point functions that can be computed with simple closed form formulas or that have relatively simple approximations, there is little to be gained by thinning the data since the ppcc plot in these cases will still be quite fast even for very large data sets. However, there are a number of distributions where the percent point function is computed by numerically inverting a cumulative distribution function (which may in turn be computed via a numerical integration). In these cases, using one of the binning techniques can make the method practical (although you will likely not obtain as accurate an estimate as the full data set would produce).

Note:
    in the one shape parameter case, 50 values of the shape parameter are used. For the two shape parameter case, the number of values used is dependent on the specific distribution. Typically, between 25 and 50 values are used in each direction.

    You can modify the number values used for the shape parameters by entering the command

      SET PPCC PLOT AXIS POINTS <val1> <val2>

    where <val1> is the number of values for the first shape parameter and <val2> is the number of values for the second shape parameter.

    There are two typical uses for this command:

    1. For distributions with a fast percent point function (e.g., the Weibull), you can increase the number of values in order to generate a more accurate estimate. This is an alternative to performing two iterations of the ppcc plot. Again, for distibutions with relatively simple percent point functions, we can generate a fairly large number of points on the plot and still have quite good performance.

    2. For distributions with slow percent point functions, you might want to decrease the number of points in order to increase the speed of the PPCC plot.
Note:
    For discrete distributions, the data will typicall consist of integers. In this case, it is helpful to group the data based on these integer values. The following code shows the recommended way for doing this:

      LET YLOW = MINIMUM Y
      LET YUPP = MAXIMUM Y
      LET YLOW = YLOW - 0.5
      CLASS LOWER YLOW
      LET YUPP = YUPP + 0.5
      CLASS UPPER YUPP
      CLASS WIDTH = 1
      LET Y2 X2 = BINNED Y
      POISSON PPCC PLOT Y2 X2
      POISSON KS PLOT Y2 X2

    This will center the bins around the integer values and will cover the first and last class.

    In this case, the KS PLOT syntax will generate a plot that shows the minimum value of the chi-square statistic. It is usually recommended that the minimum bin size be at least 5 in order for the chi-square goodness of fit to generate accurate critical values. You can automatically combine bins with the command

      LET MINSIZE = <value>
      LET Y3 XLOW XHIGH = COMBINE FREQUENCY TABLE Y2 X2

    Although the ppcc plot can also accept the unequal bin width syntax, there is typically less reason to do this for the ppcc plot. The primary reason is you want to compare the ppcc plot with the chi-square plot and you want to have comparable bins for both methods. Also, some data sets may be provided in a format with unequal bin widths (this is usually to combine bins in the tails with few points).

Default:
    None
Synonyms:
    FRECHET and EV2 are synonyms for EXTREME VALUE TYPE 2.

    LAMBDA PPCC PLOT and TUKEY PPCC PLOT are synonyms for TUKEY LAMBDA PPCC PLOT.

    STUDENT T PPCC PLOT is a synonym for T PPCC PLOT.

    The CHISQUARE term can be specified as CHISQUARE or CHI SQUARE.

    FL PPCC PLOT, BRIN SAUNDERS PPCC PLOT, and SAUNDERS BRIN are synonyms for FATIGUE LIFE PPCC PLOT.

    IG PPCC PLOT is a synonym for INVERSE GAUSSIAN PPCC PLOT.

    RIG PPCC PLOT is a synonym for RECIPROCAL INVERSE GAUSSIAN PPCC PLOT.

    GEP PPCC PLOT and GP PPCC PLOT are synonyums for GENERALIZED PARETO PLOT.

    LOGNORMAL PPCC PLOT and LOG-NORMAL PPCC PLOT are synonyms for LOG NORMAL PPCC PLOT.

    POWER LOG-NORMAL PPCC PLOT and POWER LOGNORMAL PPCC PLOT are synonyms for POWER LOG NORMAL PPCC PLOT.

    VONMISES PPCC PLOT and VON-MISES PPCC PLOT are synonyms for VON MISES PPCC PLOT.

    LOGLOGISTIC PPCC PLOT and LOG-LOGISTIC PPCC PLOT are synonyms for LOG LOGISTIC PPCC PLOT.

    SKEW LAPLACE PPCC PLOT is a synonym for SKEW DOUBLE EXPONENTIAL PPCC PLOT.

    ASYMMETRIC LAPLACE PPCC PLOT is a synonym for ASYMMETRIC DOUBLE EXPONENTIAL PPCC PLOT.

Related Commands: Reference:
    James J. Filliben (1975), "The Probability Plot Correlation Coefficient Test for Normality", Technometrics, Vol. 17, No. 1.
Applications:
    Distributional Modeling
Implementation Date:
    Pre-1987: Original implementation
    1990/5: Implemented IG, WALD, RIG, FL distributions.
    1993/12: Implemented GENERALIZED PARETO distribution.
    1995/5: Implemented LOGNORMAL, POWER NORMAL,
      POWER LOGNORMAL, POWER FUNCTION, CHI, VON MISES, and LOG LOGISTIC distributions
    2001/10: Implemented a number of 2 shape parameter distributions.
    2002/5: Implemented TWO-SIDED POWER distribution.
    2003/5: Implemented ERROR distribution.
    2004/1: Implemented FOLDED T, SKEWED T, SKEWED NORMAL,
      G AND H, INVERTED BETA distributions.
    2004/1: Support for additional two shape parameter distributions.
    2004/5: Added support for the SET PPCC FORMAT command.
    2004/5: Fixed a number of bugs in various distributions.
    2004/5: Fixed a number of bugs in various distributions.
    2004/6: Implemented SKEW DOUBLE EXPONENTIAL,
      ASYMMETRIC DOUBLE, EXPONENTIAL, MAXWELL
    2004/7: Implemented Meeker re-parametrization for GOMPERTZ MAKEHAM
    2004/9: Implemented GENERALIZED ASYMETRIC LAPLACE,
      BINOMIAL, MCLEISH, GENERALIZED MCLEISH
    2004/9: Implemented SET PPCC PLOT DATA POINTS
    2004/9: Implemented SET PPCC PLOT AXIS POINTS
    2004/9: Implemented SET PPCC PLOT AXIS ORDER
    2004/10: Implemented CENSORED case
    2005/5: Implemented REPLICATION case
    2005/5: Implemented binned case where bins are
      specified by the lower and upper limits (i.e., unequal width bins)
Program:
     
    MULTIPLOT 2 2
    MULTIPLOT CORNER COORDINATES 0 0 100 100
    MULTIPLOT SCALE FACTOR 1.5
    TITLE AUTOMATIC
    X1LABEL THEORETICAL VALUE
    Y1LABEL DATA VALUE
    TITLE OFFSET 2
    X1LABEL DISPLACEMENT 10
    Y1LABEL DISPLACEMENT 14
    CHAR X
    LINE BLANK
    JUSTIFICATION RIGHT
    .
    LET LAMBDA = 1.5
    LET Y = TUKEY LAMBDA RANDOM NUMBERS FOR I = 1 1 100
    TUKEY LAMBDA PPCC PLOT Y
    MOVE 82 30
    TEXT LAMBDA = ^SHAPE
    MOVE 82 25
    TEXT PPCC = ^MAXPPCC
    .
    LET NU = 4
    LET Y = T RANDOM NUMBERS FOR I = 1 1 100
    T PPCC PLOT Y
    MOVE 82 30
    TEXT NU = ^SHAPE
    MOVE 82 25
    TEXT PPCC = ^MAXPPCC
    .
    LET GAMMA = 2.3
    LET Y = WALD RANDOM NUMBERS FOR I = 1 1 100
    WALD PPCC PLOT Y
    MOVE 82 30
    TEXT GAMMA = ^SHAPE
    MOVE 82 25
    TEXT PPCC = ^MAXPPCC
    .
    LET GAMMA = 1.6
    LET Y = WEIBULL RANDOM NUMBERS FOR I = 1 1 100
    SET PPCC PLOT AXIS POINTS 200
    LET GAMMA1 = 0.2
    LET GAMMA2 = 25
    LINE SOLID
    CHARACTER BLANK
    WEIBULL PPCC PLOT Y
    MOVE 82 30
    TEXT GAMMA = ^SHAPE
    MOVE 82 25
    TEXT PPCC = ^MAXPPCC
    .
    END OF MULTIPLOT
        

    plot generated by sample program

Privacy Policy/Security Notice
Disclaimer | FOIA

NIST is an agency of the U.S. Commerce Department.

Date created: 8/30/2005
Last updated: 10/14/2015

Please email comments on this WWW page to alan.heckert@nist.gov.