SED navigation bar go to SED home page go to Dataplot home page go to NIST home page SED Home Page SED Staff SED Projects SED Products and Publications Search SED Pages
Dataplot Vol 1 Vol 2

PPCC PLOT

Name:
    ... PPCC PLOT
    ... KOLMOGOROV SMIRNOV PLOT
    ... ANDERSON DARLING PLOT
    ... CHI-SQUARE PLOT
Type:
    Graphics Command
Purpose:
    Generates a probability plot correlation coefficient (PPCC) plot. Alternatively, base the plot on the Anderson-Darling, Kolmogorov-Smirnov, or chi-square goodness of fit statistics.
Description:
    A PPCC plot is a graphical data analysis technique for determining that member of the specified distributional family which provides a "best" distributional fit to the data.

    The PPCC plot is based on the following two ideas:

    1. The "straightness" of the probability plot is a good measure of distributional fit. That is, the "best" distributional fit is the one with the most linear probability plot.

    2. The correlation coefficient of the points on the probabability plot is a good measure of the "straightness" (i.e., linearity) of the probability plot.

    The PPCC plot is formed by selecting a value of the shape parameter, generating the probability plot (this probability plot is not actually graphed), and then computing the correlation coefficient of the resulting probability plot. The PPCC plot then consists of:

    Vertical axis = probability plot correlation coefficient value for the given value of the shape parameter;
    Horizontal axis = distributional family parameter value (i.e., the value of the shape parameter.

    The value of the distributional parameter (on the horizontal axis) which corresponds to the maximum of the PPCC plot curve (on the vertical axis) is, of course, of interest since it indicates the best-fit member of the family.

    The PPCC PLOT has been extended to support the following additional goodness of fit statistics:

    1. the Kolmogorov-Smirnov goodness of fit statistic;

    2. the Anderson-Darling goodness of fit statistic;

    3. the chi-square goodness of fit statistic.

    For these alternative measures of goodness of fit, we follow a similar procedure. That is, we fix a value of the shape parameter, generate the corresponding probability plot in the background to obtain estimates for location and scale, and then compute the goodness of fit statistic based on these parameters. For these goodness of fit statistics, we are looking for the minimum value of the statistic rather than the maximum value of the statistic.

    Some advantages of the PPCC plot as a fitting technique are:

    1. The PPCC plot is invariant with respect to location and scale. This means that the fundamental linearity of the probability plot does not depend on the values of the location and shape parameters (i.e., we could plug-in any arbitrary values for them and the probability plot would still have the same linearity as measured by the ppcc statistic. The property follows from the fact that

        G(p;loc,scale,shape) = loc + scale*G(p;0,1,shape)

      where G denotes the percent point function of the specified distribution. So for the probability plot, using different values for loc and scale will change the scale on the x-axis, but not the linearity.

      Once we determine the optimal value of the shape parameter from the PPCC plot, we can generate the corresponding probability plot. The intercept and slope of line fit to the probability provide valid estimates of location and scale (the Dataplot probability plot is designed in such a way that this is true).

      Note: the Anderson-Darling, Kolmogorv-Smirnov, and chi-square variants are based on the cumulative distribution function and do not share this invariance property. However, we can still use the underlying probability plot to obtain estimates of location and scale for a given value of the shape parameter.

    2. The probability plot, and thus the PPCC plot, only depends on the percent point function. That is, if we know how to compute the percent point function, we can use the PPCC plot/probability plot to estimate the parameters of the distribution.

      The Anderson-Darling, Kolmogorov-Smirnov, and chi-square variants also depend on computing the cumulative distribution function.

    3. The PPCC plot can show the sensitivity of the shape parameter. That is, it can show what neighborhood of the parameter estimate is likely to produce a reasonably straight probability plot.

    4. The PPCC plot can be applied to binned data.

      The chi-square variant can also be applied to binned data. Currently, the Anderson-Darling and Kolmogorov Smirnov variants cannot be applied to binned data.

    5. The PPCC plot can be applied to censored data.

      A censored PPCC plot is generated by finding the value of the shape parameter that results in the maximum correlation coefficient of the censored probability plot. For details on how the censored probablity plot is generated, enter the command

        HELP PROBABILITY PLOT

      The censoring variable should contain a 1 to indicate a failure time and a 0 to indicate a censoring time. The censored PPCC plot is not suppported for binned data.

      The censoring option is not currently supported by the Anderson-Darling, Kolmogorov-Smirnov, and chi-square variants of the plot.

    Some disadvantages of the PPCC plot as a fitting technique are:

    1. The PPCC plot (and its variants) do not have the mathematical optimality properties that analytic methods such as maximum likelihood have.

    2. If the percent point function is expensive to compute (e.g., if it involves the numerical inversion of a rather complicated cumulative distribution function), the ppcc plot can be slow to generate. These types of percent point functions may also have convergence problems.

      In these cases, the SET PPCC PLOT DATA POINTS may be helpful in reducing the computational burden. See the Note section below.

    3. The PPCC plot does not produce interval estimates for the parameters.

      The bootstrap provides a method for generating these interval estimates. For details, enter

    4. Heavy-tailed distributions may have very high variability in the extremes of the data. This can sometimes lead to poor discrimination in the plot.

      In our experience, the Anderson-Darling and Kolmogorov-Smirnov variants of the plot may perform better for these cases.

    5. If a shape parameter behaves much like a scale or location parameter, the PPCC plot may not discriminate well.

      The Anderson-Darling and Kolmogorov-Smirnov variants have the option of fixing the values of the location and scale parameters. This can sometimes be useful in these cases.

      LI>The PPCC plot does not generate smooth curves for discrete distributions due to the discreteness of the percent point function. For discrete distributions, the chi-square variant of the plot typically produces smoother plots.

    6. The PPCC plot does not extend well to more than one shape parameter.

      Dataplot has extended the PPCC plot to distributions with two shape parameters. Dataplot supports two formats for the PPCC plot with two shape parameters:

      1. As in the one shape parameter case, the Y axis will contain the value of the correlation coefficient. The X axis will contain the value of the second shape parameter. Each value of the first shape parameter will be represented by a separate trace (i.e., curve) on the plot.

        To change the order of the shape parameters in the above format, enter the command

          SET PPCC PLOT AXIS ORDER REVERSE

        To restore the default order, enter the command

          SET PPCC PLOT AXIS ORDER DEFAULT

      2. Alternatively, you can generate a 3D wireframe plot.

        You can specify which format to use with the command

          SET PPCC FORMAT <TRACE/3D>

    The PPCC plot now supports two different types of grouping.

    1. Some data sets are collected in binned format. That is, the values for the data are split into intervals and the number of occurences of the data within each interval are are counted.

      Dataplot supports either equal sized bins (the bin variable contains the mid-point of the bin) or unequal size bins (two bin variables are specified: one contains the lower limit for the bins and the other contains the upper limits for the bins).

    2. The ppcc plot also supports the case where there are multiple batches of data. In this case, a separate ppcc curve is drawn for each batch of data (for unbinned data a curve will also be drawn for the full data set). We refer to this as the "replication" case below. Replication can be used for either the raw data case or the binned data case.

      This form is useful for the case where we want to know if different batches of data can be modeled with a common shape parameter. One example of this is accelerated testing where Weibull models should have a common shape parameter at different stress levels if a linear accelaraton model is valid.

    PPCC plots are available for the following continuous distributional families (with the distributional parameter in parentheses) with one shape parameter:

    1. Weibull (gamma)
    2. double weibull (gamma)
    3. inverted weibull (gamma)
    4. gamma (gamma)
    5. double gamma (gamma)
    6. log gamma (gamma)
    7. inverted gamma (gamma)
    8. Wald (gamma)
    9. fatigue life (gamma)
    10. Pareto (gamma)
    11. Pareto second kind (gamma)
    12. generalized Pareto (gamma)
    13. generalized half logistic (gamma)
    14. extreme value type 2 (gamma)
    15. generalized extreme value (gamma)
    16. extreme value (gamma, combines Weibull, extreme value type 2)
    17. geometric extreme exponential (gamma)
    18. Tukey lambda (lambda)
    19. skew normal (lambda)
    20. skew double exponential (lambda)
    21. t (nu)
    22. folded t (nu)
    23. chi-squared (nu)
    24. chi (nu)
    25. generalized logistic (alpha)
    26. log double exponential (alpha)
    27. error (alpha)
    28. lognormal (sd)
    29. power-normal (p)
    30. Von Mises (b)
    31. reciprocal (b)
    32. log-logistic (delta)
    33. wrapped cauchy (c)
    34. Bradford (beta)
    35. asymmetric double exponential (k)

    PPCC plots are available for the following continuous distributional families (with the distributional parameter in parentheses) with two shape parameters:

    1. inverse Gaussian (gamma, mu)
    2. reciprocal inverse gaussian (gamma, mu)
    3. generalized gamma (gamma, c)
    4. exponentiated Weibull (gamma, theta)
    5. exponential power (alpha, beta)
    6. Beta (alpha, beta)
    7. inverted beta (alpha, beta)
    8. two-sided power (theta, n)
    9. Johnson SU (alpha1, alpha2)
    10. Johnson SB (alpha1, alpha2)
    11. alpha (alpha1, alpha2)
    12. Gompertz (c, b)
    13. g and h (g, h)
    14. F (nu1, nu2)
    15. log skew normal (lambda, sd)
    16. power lognormal (nu, sd)
    17. folded normal (mu, sd)
    18. folded Cauchy (loc, scale)
    19. skew t (nu, lambda)
    20. noncentral t (nu, lambda)
    21. noncentral chi-square (nu, lambda)
    22. truncated exponential (m, sd, assume truncation point, X0, is known)

    PPCC plots are available for the following discrete distributional families (with the distributional parameter in parentheses):

    1. geometric (p)
    2. Yule (p)
    3. Poisson (lambda)
    4. logarithmic series (theta)
    5. binomial (p, assume n known)
    6. negative binomial (p, assume k known)
    7. Beta-Binomial (alpha, beta, assume n known)
    8. Hermite (alpha, beta)

    The use of the PPCC plot for discrete distributions is still experimental (see the Note below).

    The percent point function for the discrete distributions is a step function (since X is restricted to integer values). This can result in non-smooth ppcc and probability plots. For discrete distributions, the KS PLOT (which will plot the minimum value of chi-square statistic) is recommended over the PPCC PLOT as long as the sample size is reasonably large.

Syntax 1:
    <family> PPCC PLOT <y>             <SUBSET/EXCEPT/FOR/qualification>
    where <y> is the variable of raw data values under analysis;
                <family> is one of the distributions listed above;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    This syntax is used for the raw data case.

    The syntax PPCC PLOT can be replaced with ANDERSON DARLING PLOT, KOLMOGOROV SMIRNOV PLOT, or CHI-SQUARE PLOT to generate the Anderson-Darling, Kolmogorov-Smirnov, and chi-square variants of the plot, respectively.

Syntax 2:
    <family> PPCC PLOT <y> <x>
                            <SUBSET/EXCEPT/FOR/qualification>
    where <y> is the variable of pre-computed frequencies;
                <x> is the variable of distinct values for the variable under analysis;
                <family> is one of the families listed above;
    and where the is optional.

    This syntax is used for the binned data case where the bins are defined by the mid-points of each bin.

    The syntax PPCC PLOT can be replaced with CHI-SQUARE PLOT to generate the chi-square variant of the plot. This syntax is not supported for the Anderson-Darling and Kolmogorov-Smirnov variants of the plot.

Syntax 3:
    <family> PPCC PLOT <y> <xlow> <xhigh>
                            <SUBSET/EXCEPT/FOR/qualification>
    where <y> is the variable of pre-computed frequencies;
                <xlow> is the variable containing the lower limits for the bins;
                <xhigh> is the variable containing the upper limits for the bins;
                <family> is one of the families listed above;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    This syntax is used for the binned data case where the bins are defined by the lower and upper limits of the bins (i.e., the bins can be of unequal width).

    The syntax PPCC PLOT can be replaced with CHI-SQUARE PLOT to generate the chi-square variant of the plot. This syntax is not supported for the Anderson-Darling and Kolmogorov-Smirnov variants of the plot.

Syntax 4:
    <family> CENSORED PPCC PLOT <y> <x>
                            <SUBSET/EXCEPT/FOR/qualification>
    where <y> is the variable of raw data values under analysis;
                <x> is the censoring variabe;
                <family> is one of the distributions listed above;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    This syntax is used for the raw data case where there is censoring.

    Censoring is not supported for discrete disributions or grouped data. It is also not supported for the Anderson-Darling, Kolmogorov-Smirnov, and chi-square variants of the plot.

Syntax 5:
    <family> CENSORED PPCC PLOT <y> <censor> <x>
                            <SUBSET/EXCEPT/FOR/qualification>
    where <y> is the variable of raw data values under analysis;
                <censor> is the censoring variabe;
                <x> is the variable of distinct values for the variable under analysis;
                <family> is one of the distributions listed above;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    This syntax is used for the case where we have frequency (binned) data with censoring. The bins are defined by their mid-points. When a particular bin has both censored and uncensored data, there will be 2 rows with the same value for .

    A value of 1 indicates a failure time and a value of 0 indicates a censoring time.

    Censoring is not supported for discrete disributions or grouped data. It is also not supported for the Anderson-Darling, Kolmogorov-Smirnov, and chi-square variants of the plot.

Syntax 6:
    <family> CENSORED PPCC PLOT <y> <censor> <xlow> <xhigh>
                            <SUBSET/EXCEPT/FOR/qualification>
    where <y> is the variable of raw data values under analysis;
                <censor> is the censoring variabe;
                <xlow> is the variable containing the lower limits for the bins;
                <xhigh> is the variable containing the upper limits for the bins;
                <family> is one of the distributions listed above;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    This syntax is used for the case where we have frequency (binned) data with censoring. The bins are defined by their lower and upper limits. This syntax allows bins with unequal widths. When a particular bin has both censored and uncensored data, there will be 2 rows with the same values for <xlow> and <xhigh>.

    A value of 1 indicates a failure time and a value of 0 indicates a censoring time for the censoring variable.

    Censoring is not supported for discrete disributions or grouped data. It is also not supported for the Anderson-Darling, Kolmogorov-Smirnov, and chi-square variants of the plot.

Syntax 7:
    <family> REPLICATED PPCC PLOT <y> <x1> ... <xk>
                            <SUBSET/EXCEPT/FOR/qualification>
    where <y> is the variable of raw data values under analysis;
                <x1> ... <xk> is a list of one to two group id variables;
                <family> is one of the distributions listed above;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    The group-id variables are cross-tabulated and a ppcc plot will be generated for each distinct combination of values for the group-id variables. These plots will be overlaid on the same plot.

    The syntax PPCC PLOT can be replaced with ANDERSON DARLING PLOT, KOLMOGOROV SMIRNOV PLOT, or CHI-SQUARE PLOT to generate the Anderson-Darling, Kolmogorov-Smirnov, and chi-square variants of the plot, respectively.

Syntax 8:
    <family> CENSORED REPLICATED PPCC PLOT <y> <x> <x1> ... <xk>
                            <SUBSET/EXCEPT/FOR/qualification>
    where <y> is the variable of raw data values under analysis;
                <x> is the censoring variabe;
                <x1> ... <xk> is a list of one to two group id variables;
                <family> is one of the distributions listed above;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    The group-id variables are cross-tabulated and a ppcc plot will be generated for each distinct combination of values for the group-id variables. These plots will be overlaid on the same plot.

    Censoring is not supported for discrete disributions or grouped data. It is also not supported for the Anderson-Darling, Kolmogorov-Smirnov, and chi-square variants of the plot.

Syntax 9:
    <family> REPLICATED GROUPED PPCC PLOT <y> <x> <x1> ... <xk>
                            <SUBSET/EXCEPT/FOR/qualification>
    where <y> is the variable of pre-computed frequencies;
                <x> is the variable of distinct values for the variable under analysis;
                <x1> ... <xk> is a list of one to two group id variables;
                <family> is one of the families listed above;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    This syntax is used for the binned data case where there are multiple batches of data. The bins are defined by the mid-points of each bin and there are multiple batches of data.

    The syntax PPCC PLOT can be replaced with CHI-SQUARE PLOT to generate the chi-square variant of the plot. This syntax is not supported for the Anderson-Darling and Kolmogorov-Smirnov variants of the plot.

Syntax 10:
    <family> REPLICATED UNEQUAL GROUPED PPCC PLOT <y>
                            <xlow> <xhigh> <x1> ... <xk>
                            <SUBSET/EXCEPT/FOR/qualification>
    where <y> is the variable of pre-computed frequencies;
                <xlow> is the variable containing the lower limits for the bins;
                <xhigh> is the variable containing the upper limits for the bins;
                <x1> ... <xk> is a list of one to two group id variables;
                <family> is one of the families listed above;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    This syntax is used for the binned data case where there are multiple batches of data. The bins are defined by the lower and upper limits of the bins (i.e., the bins can be of unequal width).

    The syntax PPCC PLOT can be replaced with CHI-SQUARE PLOT to generate the chi-square variant of the plot. This syntax is not supported for the Anderson-Darling and Kolmogorov-Smirnov variants of the plot.

Syntax 11:
    <family> MULTIPLE PPCC PLOT <y1> ... <yk>
                            <SUBSET/EXCEPT/FOR/qualification>
    where <y1> ... <yk> is a list of response variables;
                <family> is one of the distributions listed above;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    Note that the response variables can also be matrices. If a matrix name is encountered, a ppcc plot will be drawn for all the values in the matrix. For multiple response variables, the ppcc plots will be overlaid on the same plot.

    The syntax PPCC PLOT can be replaced with ANDERSON DARLING PLOT, KOLMOGOROV SMIRNOV PLOT, or CHI-SQUARE PLOT to generate the Anderson-Darling, Kolmogorov-Smirnov, and chi-square variants of the plot, respectively.

Examples:
    LAMBDA PPCC PLOT X
    T PPCC PLOT X
    EXTREME VALUE TYPE 2 PPCC PLOT X
    POISSON PPCC PLOT X
    LAMBDA PPCC PLOT F X
    T PPCC PLOT F X
    EXTREME VALUE TYPE 2 PPCC PLOT F X
    POISSON PPCC PLOT F X
Note:
    The PPCC is not the only goodness of fit criterion that can be used. The following additional options are available:

      KOLMOGOROV-SMIRNOV PLOT (or KS PLOT)
      ANDERSON DARLING PLOT (or AD PLOT)
      CHI-SQUARE PLOT (or CHISQUARE PLOT)

    Currently, these alternatives are limited to the uncensored case. In addition, the KS PLOT and AD PLOT are restricted to the raw data case and the CHI-SQUARE PLOT is restricted to the binned data case.

    Note that the PPCC method is invariant to location and scale. This basically means that we can use the underlying probability plot to estimate the location and scale parameters.

    These other methods are not invariant to location and scale. By default, we still use the estimates from the underlying probability plot to estimate location and scale. Although these estimates may not be "optimal", they should at least be reasonable. However, you can fix the estimates of location and scale by entering the commands

      LET KSLOC = <value>
      LET KSSCALE = <value>

    These apply to the Kolmogorov-Smirnov, Anderson-Darling, and chi-square variants of the plot.

Note:
    For information on how the Kolmogorov-Smirnov, Anderson-Darling, and chi-square goodness of fit statistics are computed, enter

Note:
    The chi-square variant of the plot is most frequently used when the data are received in pre-binned form (for raw data, the PPCC, Anderson-Darling or Kolmogorov-Smirnov variants are typically preferred). However, you can use the chi-square test for raw data (you typically will want to have a reasonably large data set before doing this). For raw data, you can specify the binning with the commands CLASS WIDTH, CLASS LOWER, and CLASS UPPER. The default class width is 0.3 times the sample standard deviation. To specify other default algorithms, enter HELP HISTOGRAM CLASS WIDTH.

    This test is sensitive to the choice of bins. There is no optimal choice for the bin width (since the optimal bin width depends on the distribution). Most reasonable choices should produce similar, but not identical, results.

    For the chi-square approximation to be valid, the expected frequency should be at least 5. The chi-square approximation may not be valid for small samples, and if some of the counts are less than five, you may need to combine some bins in the tails.

Note:
    The range of parameter is determined automatically. However, if you wish to restrict the range, you can specify the lower and upper limits by appending a 1 or 2 to the parameter name and assigning a value. For example, to restrict a Weibull ppcc plot to values 0.5 and 20, do the following:

      LET GAMMA1 = 0.5
      LET GAMMA2 = 20
      WEIBULL PPCC PLOT Y

    A common use of this is to obtain a refinement of the estimate of the shape parameter. That is, an initial iteration (typically just the default values of the parameter) is used to identify the appropriate neighborhood of the optimal value of the shape parameter. Then a second iteration of the PPCC PLOT is generated with the parameter restricted to a much narrower range of values. Although this iteration can be repeated as many times as you like, for practical purposes a two iterations is typically sufficient.

Note:
    The PPCC PLOT automatically saves several parameters. The MAXPPCC parameter contains the maximum correlation that was computed and the SHAPE parameter contains the value of the estimated distributional parameter (e.g., GAMMA for the Weibull distribution) that corresponds to MAXPPCC.

    In the case of two shape parameters, these are saved as SHAPE1 and SHAPE2.

Note:
    For the truncated exponential distribution, we assume that the truncation parameter, X0, is known. To set this value, enter

      LET X0 = <value>

    before generating the ppcc plot.

    For the noncentral t and noncentral chi-square distributions, we can fix the value of the degrees of freedom parameter to a single value. In this case, the ppcc plot reverts to a one shape parameter plot. Enter the commands

      LET NU1 = <value>
      LET NU2 = <value>

    where <value> is the same for NU1 and NU2.

Note:
    The SET MINMAX command can be used to specify the minimum or maximum form for the following distributions:

    • Weibull
    • Frechet (extreme value type 2)
    • generalized extreme value

    A value of 1 or MIN specifies the minimum form of the disribution and a value of 2 or MAX specifies the maximum form of the distribution.

    Although earlier versions of Dataplot required that this parameter be explicitly entered, Dataplot will now choose a default form of the distribution if it has not been specified. For the Weibull, the minimum form is the default. For the Frechet and generalized extreme value disributions, the maximum form is the default. Note that if you enter an explicit SET MINMAX command, it applies to all 3 distributions.

Note:
    When the percent point function is expensive to compute, the PPCC plot can take a long time to generate. Two approaches to this are:

    1. You can bin the data before generating the PPCC plot.

    2. As an alternative to binning, you can use the command

        SET PPCC PLOT DATA POINTS <value>

      With this command, Dataplot will generate <value> equally spaced percentiles of the data. The PPCC plot is then generated on these percentiles.

      If the number of data points in the response variable is less than <value> then the full data set is used.

      The minimum number for <value> is 25. Numbers in the range 50 to 200 are typically used.

    One problem with binning data is that the optimal bin width is dependent on the underlying distribution (which is what we are trying to determine). Given a choice, we would recommend using the second form when possible. However, in practice many data sets are collected in binned format. In this case, the first form is the best we can do.

    For distributions that have percent point functions that can be computed with simple closed form formulas or that have relatively simple approximations, there is little to be gained by thinning the data since the ppcc plot in these cases will still be quite fast even for very large data sets. However, there are a number of distributions where the percent point function is computed by numerically inverting a cumulative distribution function (which may in turn be computed via a numerical integration). In these cases, using one of the binning techniques can make the method practical (although you will likely not obtain as accurate an estimate as the full data set would produce).

Note:
    in the one shape parameter case, 50 values of the shape parameter are used. For the two shape parameter case, the number of values used is dependent on the specific distribution. Typically, between 25 and 50 values are used in each direction.

    You can modify the number values used for the shape parameters by entering the command

      SET PPCC PLOT AXIS POINTS <val1> <val2>

    where <val1> is the number of values for the first shape parameter and <val2> is the number of values for the second shape parameter.

    There are two typical uses for this command:

    1. For distributions with a fast percent point function (e.g., the Weibull), you can increase the number of values in order to generate a more accurate estimate. This is an alternative to performing two iterations of the ppcc plot. Again, for distibutions with relatively simple percent point functions, we can generate a fairly large number of points on the plot and still have quite good performance.

    2. For distributions with slow percent point functions, you might want to decrease the number of points in order to increase the speed of the PPCC plot.
Note:
    For discrete distributions, the data will typicall consist of integers. In this case, it is helpful to group the data based on these integer values. The following code shows the recommended way for doing this:

      LET YLOW = MINIMUM Y
      LET YUPP = MAXIMUM Y
      LET YLOW = YLOW - 0.5
      CLASS LOWER YLOW
      LET YUPP = YUPP + 0.5
      CLASS UPPER YUPP
      CLASS WIDTH = 1
      LET Y2 X2 = BINNED Y
      POISSON PPCC PLOT Y2 X2
      POISSON KS PLOT Y2 X2

    This will center the bins around the integer values and will cover the first and last class.

    In this case, the KS PLOT syntax will generate a plot that shows the minimum value of the chi-square statistic. It is usually recommended that the minimum bin size be at least 5 in order for the chi-square goodness of fit to generate accurate critical values. You can automatically combine bins with the command

      LET MINSIZE = <value>
      LET Y3 XLOW XHIGH = COMBINE FREQUENCY TABLE Y2 X2

    Although the ppcc plot can also accept the unequal bin width syntax, there is typically less reason to do this for the ppcc plot. The primary reason is you want to compare the ppcc plot with the chi-square plot and you want to have comparable bins for both methods. Also, some data sets may be provided in a format with unequal bin widths (this is usually to combine bins in the tails with few points).

Note:
    The chi-square variant of the plot can sometimes produce very large numbers. To improve the plot resolution for the smaller values of the chi-square statistic (which is the area of interest) you can provide a truncation value (i.e., values of the chi-square statistic greater than the truncation value will be set to the truncation value), enter the command

      SET CHI-SQUARE LIMIT <value>
Note:
    For the PPCC PLOT command, Dataplot fits a least squares line to the points on the underlying probability plot. For the AD PLOT, KS PLOT, and CHI-SQUARE PLOT variants, this least squares fit is used to obtain estimates for the location and scale parameters.

    Alternatively, you can specify that Dataplot fit a robust regression using the biweight method by entering the command

      SET PPCC PLOT LOCATION SCALE BIWEIGHT

    To reset the default of non-robust least squares, enter

      SET PPCC PLOT LOCATION SCALE DEFAULT

    In our experience, this option can be useful for heavy tailed distributiuons such as the SLASH and CAUCHY distributions.

Default:
    None
Synonyms:
    AD PLOT is a synonym for ANDERSON DARLING PLOT.
    KS PLOT is a synonym for KOLMOGOROV SMIRNOV PLOT.

    FRECHET and EV2 are synonyms for EXTREME VALUE TYPE 2.

    LAMBDA PPCC PLOT and TUKEY PPCC PLOT are synonyms for TUKEY LAMBDA PPCC PLOT.

    STUDENT T PPCC PLOT is a synonym for T PPCC PLOT.

    The CHISQUARE term can be specified as CHISQUARE or CHI SQUARE.

    FL PPCC PLOT, BRIN SAUNDERS PPCC PLOT, and SAUNDERS BRIN are synonyms for FATIGUE LIFE PPCC PLOT.

    IG PPCC PLOT is a synonym for INVERSE GAUSSIAN PPCC PLOT.

    RIG PPCC PLOT is a synonym for RECIPROCAL INVERSE GAUSSIAN PPCC PLOT.

    GEP PPCC PLOT and GP PPCC PLOT are synonyums for GENERALIZED PARETO PLOT.

    LOGNORMAL PPCC PLOT and LOG-NORMAL PPCC PLOT are synonyms for LOG NORMAL PPCC PLOT.

    POWER LOG-NORMAL PPCC PLOT and POWER LOGNORMAL PPCC PLOT are synonyms for POWER LOG NORMAL PPCC PLOT.

    VONMISES PPCC PLOT and VON-MISES PPCC PLOT are synonyms for VON MISES PPCC PLOT.

    LOGLOGISTIC PPCC PLOT and LOG-LOGISTIC PPCC PLOT are synonyms for LOG LOGISTIC PPCC PLOT.

    SKEW LAPLACE PPCC PLOT is a synonym for SKEW DOUBLE EXPONENTIAL PPCC PLOT.

    ASYMMETRIC LAPLACE PPCC PLOT is a synonym for ASYMMETRIC DOUBLE EXPONENTIAL PPCC PLOT.

Related Commands: Reference:
    James J. Filliben (1975), "The Probability Plot Correlation Coefficient Test for Normality," Technometrics, Vol. 17, No. 1.
Applications:
    Distributional Modeling
Implementation Date:
    Pre-1987: Original implementation
    1990/5: Implemented IG, WALD, RIG, FL distributions.
    1993/12: Implemented GENERALIZED PARETO distribution.
    1995/5: Implemented LOGNORMAL, POWER NORMAL,
      POWER LOGNORMAL, POWER FUNCTION, CHI, VON MISES, and LOG LOGISTIC distributions
    2001/10: Implemented a number of 2 shape parameter distributions.
    2002/5: Implemented TWO-SIDED POWER distribution.
    2003/5: Implemented ERROR distribution.
    2004/1: Implemented FOLDED T, SKEWED T, SKEWED NORMAL,
      G AND H, INVERTED BETA distributions.
    2004/1: Support for additional two shape parameter distributions.
    2004/5: Added support for the SET PPCC FORMAT command.
    2004/5: Fixed a number of bugs in various distributions.
    2004/5: Fixed a number of bugs in various distributions.
    2004/6: Implemented SKEW DOUBLE EXPONENTIAL,
      ASYMMETRIC DOUBLE, EXPONENTIAL, MAXWELL
    2004/7: Implemented Meeker re-parametrization for GOMPERTZ MAKEHAM
    2004/9: Implemented GENERALIZED ASYMETRIC LAPLACE,
      BINOMIAL, MCLEISH, GENERALIZED MCLEISH
    2004/9: Implemented SET PPCC PLOT DATA POINTS
    2004/9: Implemented SET PPCC PLOT AXIS POINTS
    2004/9: Implemented SET PPCC PLOT AXIS ORDER
    2004/10: Implemented CENSORED case
    2005/5: Implemented REPLICATION case
    2005/5: Implemented binned case where bins are
      specified by the lower and upper limits (i.e., unequal width bins)
Program 1:
     
    MULTIPLOT 2 2
    MULTIPLOT CORNER COORDINATES 0 0 100 100
    MULTIPLOT SCALE FACTOR 1.5
    TITLE AUTOMATIC
    X1LABEL THEORETICAL VALUE
    Y1LABEL DATA VALUE
    TITLE OFFSET 2
    X1LABEL DISPLACEMENT 10
    Y1LABEL DISPLACEMENT 14
    CHAR X
    LINE BLANK
    JUSTIFICATION RIGHT
    .
    LET LAMBDA = 1.5
    LET Y = TUKEY LAMBDA RANDOM NUMBERS FOR I = 1 1 100
    TUKEY LAMBDA PPCC PLOT Y
    MOVE 82 30
    TEXT LAMBDA = ^SHAPE
    MOVE 82 25
    TEXT PPCC = ^MAXPPCC
    .
    LET NU = 4
    LET Y = T RANDOM NUMBERS FOR I = 1 1 100
    T PPCC PLOT Y
    MOVE 82 30
    TEXT NU = ^SHAPE
    MOVE 82 25
    TEXT PPCC = ^MAXPPCC
    .
    LET GAMMA = 2.3
    LET Y = WALD RANDOM NUMBERS FOR I = 1 1 100
    WALD PPCC PLOT Y
    MOVE 82 30
    TEXT GAMMA = ^SHAPE
    MOVE 82 25
    TEXT PPCC = ^MAXPPCC
    .
    LET GAMMA = 1.6
    LET Y = WEIBULL RANDOM NUMBERS FOR I = 1 1 100
    SET PPCC PLOT AXIS POINTS 200
    LET GAMMA1 = 0.2
    LET GAMMA2 = 25
    LINE SOLID
    CHARACTER BLANK
    WEIBULL PPCC PLOT Y
    MOVE 82 30
    TEXT GAMMA = ^SHAPE
    MOVE 82 25
    TEXT PPCC = ^MAXPPCC
    .
    END OF MULTIPLOT
        

    plot generated by sample program

Program 2:
     
    let gamma = 5.1
    let y = weibull rand numb for i = 1 1 200
    .
    let gamma1 = 0.5
    let gamma2 = 50
    set ppcc plot axis points 449
    .
    multiplot corner coordinates 2 2 98 98
    multiplot scale factor 2
    multiplot 2 2
    title automatic
    title offset 2
    justification center
    height 1.7
    tic mark offset units screen
    ytic mark offset 3 0
    .
    weibull ppcc plot y
    let shape = round(shape,1)
    let maxppcc2 = round(maxppcc,3)
    move 50 5
    text Shape: ^shape, Max PPCC: ^maxppcc2
    .
    weibull anderson darling plot y
    let shape = round(shape,1)
    let minad2 = round(minad,3)
    move 50 5
    text Shape: ^shape, Min AD: ^minad2
    .
    weibull ks plot y
    let shape = round(shape,1)
    let minks = round(minks,3)
    move 50 5
    text Shape: ^shape, Min KS: ^minks
    .
    set chisquare limit 100
    weibull chi-square plot y
    let shape = round(shape,1)
    let minchsq = round(minchisq,3)
    move 50 5
    text Shape: ^shape, Min Chi-Square: ^minchsq
    .
    end of multiplot
        
    plot generated by sample program



Date created: 08/30/2005
Last updated: 05/18/2023

Please email comments on this WWW page to alan.heckert@nist.gov.