PPCC PLOT

Name:

Type:

Graphics Command Purpose:

Generates a probability plot correlation coefficient (PPCC) plot. Alternatively, base the plot on the Anderson-Darling, Kolmogorov-Smirnov, or chi-square goodness of fit statistics. Description:

The PPCC plot is based on the following two ideas:

The "straightness" of the probability plot is a good measure of distributional fit. That is, the "best" distributional fit is the one with the most linear probability plot.
The correlation coefficient of the points on the probabability plot is a good measure of the "straightness" (i.e., linearity) of the probability plot.

The PPCC plot is formed by selecting a value of the shape parameter, generating the probability plot (this probability plot is not actually graphed), and then computing the correlation coefficient of the resulting probability plot. The PPCC plot then consists of:

Vertical axis = probability plot correlation coefficient value for the given value of the shape parameter;

Horizontal axis = distributional family parameter value (i.e., the value of the shape parameter.

The value of the distributional parameter (on the horizontal axis) which corresponds to the maximum of the PPCC plot curve (on the vertical axis) is, of course, of interest since it indicates the best-fit member of the family.

The PPCC PLOT has been extended to support the following additional goodness of fit statistics:

the Kolmogorov-Smirnov goodness of fit statistic;
the Anderson-Darling goodness of fit statistic;
the chi-square goodness of fit statistic.

For these alternative measures of goodness of fit, we follow a similar procedure. That is, we fix a value of the shape parameter, generate the corresponding probability plot in the background to obtain estimates for location and scale, and then compute the goodness of fit statistic based on these parameters. For these goodness of fit statistics, we are looking for the minimum value of the statistic rather than the maximum value of the statistic.

Some advantages of the PPCC plot as a fitting technique are:

The PPCC plot is invariant with respect to location and scale. This means that the fundamental linearity of the probability plot does not depend on the values of the location and shape parameters (i.e., we could plug-in any arbitrary values for them and the probability plot would still have the same linearity as measured by the ppcc statistic. The property follows from the fact that
where G denotes the percent point function of the specified distribution. So for the probability plot, using different values for loc and scale will change the scale on the x-axis, but not the linearity.
Once we determine the optimal value of the shape parameter from the PPCC plot, we can generate the corresponding probability plot. The intercept and slope of line fit to the probability provide valid estimates of location and scale (the Dataplot probability plot is designed in such a way that this is true).
Note: the Anderson-Darling, Kolmogorv-Smirnov, and chi-square variants are based on the cumulative distribution function and do not share this invariance property. However, we can still use the underlying probability plot to obtain estimates of location and scale for a given value of the shape parameter.
The probability plot, and thus the PPCC plot, only depends on the percent point function. That is, if we know how to compute the percent point function, we can use the PPCC plot/probability plot to estimate the parameters of the distribution.
The Anderson-Darling, Kolmogorov-Smirnov, and chi-square variants also depend on computing the cumulative distribution function.
The PPCC plot can show the sensitivity of the shape parameter. That is, it can show what neighborhood of the parameter estimate is likely to produce a reasonably straight probability plot.
The PPCC plot can be applied to binned data.
The chi-square variant can also be applied to binned data. Currently, the Anderson-Darling and Kolmogorov Smirnov variants cannot be applied to binned data.
The PPCC plot can be applied to censored data.
A censored PPCC plot is generated by finding the value of the shape parameter that results in the maximum correlation coefficient of the censored probability plot. For details on how the censored probablity plot is generated, enter the command
The censoring variable should contain a 1 to indicate a failure time and a 0 to indicate a censoring time. The censored PPCC plot is not suppported for binned data.
The censoring option is not currently supported by the Anderson-Darling, Kolmogorov-Smirnov, and chi-square variants of the plot.

Some disadvantages of the PPCC plot as a fitting technique are:

The PPCC plot (and its variants) do not have the mathematical optimality properties that analytic methods such as maximum likelihood have.
If the percent point function is expensive to compute (e.g., if it involves the numerical inversion of a rather complicated cumulative distribution function), the ppcc plot can be slow to generate. These types of percent point functions may also have convergence problems.
In these cases, the SET PPCC PLOT DATA POINTS may be helpful in reducing the computational burden. See the Note section below.
The PPCC plot does not produce interval estimates for the parameters.
The bootstrap provides a method for generating these interval estimates. For details, enter
Heavy-tailed distributions may have very high variability in the extremes of the data. This can sometimes lead to poor discrimination in the plot.
In our experience, the Anderson-Darling and Kolmogorov-Smirnov variants of the plot may perform better for these cases.
If a shape parameter behaves much like a scale or location parameter, the PPCC plot may not discriminate well.
The Anderson-Darling and Kolmogorov-Smirnov variants have the option of fixing the values of the location and scale parameters. This can sometimes be useful in these cases.
LI>The PPCC plot does not generate smooth curves for discrete distributions due to the discreteness of the percent point function. For discrete distributions, the chi-square variant of the plot typically produces smoother plots.
The PPCC plot does not extend well to more than one shape parameter.
Dataplot has extended the PPCC plot to distributions with two shape parameters. Dataplot supports two formats for the PPCC plot with two shape parameters:
1. As in the one shape parameter case, the Y axis will contain the value of the correlation coefficient. The X axis will contain the value of the second shape parameter. Each value of the first shape parameter will be represented by a separate trace (i.e., curve) on the plot.
  To change the order of the shape parameters in the above format, enter the command
  To restore the default order, enter the command
2. Alternatively, you can generate a 3D wireframe plot.
  You can specify which format to use with the command

Some data sets are collected in binned format. That is, the values for the data are split into intervals and the number of occurences of the data within each interval are are counted.
Dataplot supports either equal sized bins (the bin variable contains the mid-point of the bin) or unequal size bins (two bin variables are specified: one contains the lower limit for the bins and the other contains the upper limits for the bins).
The ppcc plot also supports the case where there are multiple batches of data. In this case, a separate ppcc curve is drawn for each batch of data (for unbinned data a curve will also be drawn for the full data set). We refer to this as the "replication" case below. Replication can be used for either the raw data case or the binned data case.
This form is useful for the case where we want to know if different batches of data can be modeled with a common shape parameter. One example of this is accelerated testing where Weibull models should have a common shape parameter at different stress levels if a linear accelaraton model is valid.

PPCC plots are available for the following continuous distributional families (with the distributional parameter in parentheses) with one shape parameter:

Weibull (gamma)
double weibull (gamma)
inverted weibull (gamma)
gamma (gamma)
double gamma (gamma)
log gamma (gamma)
inverted gamma (gamma)
Wald (gamma)
fatigue life (gamma)
Pareto (gamma)
Pareto second kind (gamma)
generalized Pareto (gamma)
generalized half logistic (gamma)
extreme value type 2 (gamma)
generalized extreme value (gamma)
extreme value (gamma, combines Weibull, extreme value type 2)
geometric extreme exponential (gamma)
Tukey lambda (lambda)
skew normal (lambda)
skew double exponential (lambda)
t (nu)
folded t (nu)
chi-squared (nu)
chi (nu)
generalized logistic (alpha)
log double exponential (alpha)
error (alpha)
lognormal (sd)
power-normal (p)
Von Mises (b)
reciprocal (b)
log-logistic (delta)
wrapped cauchy (c)
Bradford (beta)
asymmetric double exponential (k)

PPCC plots are available for the following continuous distributional families (with the distributional parameter in parentheses) with two shape parameters:

inverse Gaussian (gamma, mu)
reciprocal inverse gaussian (gamma, mu)
generalized gamma (gamma, c)
exponentiated Weibull (gamma, theta)
exponential power (alpha, beta)
Beta (alpha, beta)
inverted beta (alpha, beta)
two-sided power (theta, n)
Johnson SU (alpha1, alpha2)
Johnson SB (alpha1, alpha2)
alpha (alpha1, alpha2)
Gompertz (c, b)
g and h (g, h)
F (nu1, nu2)
log skew normal (lambda, sd)
power lognormal (nu, sd)
folded normal (mu, sd)
folded Cauchy (loc, scale)
skew t (nu, lambda)
noncentral t (nu, lambda)
noncentral chi-square (nu, lambda)
truncated exponential (m, sd, assume truncation point, X0, is known)

PPCC plots are available for the following discrete distributional families (with the distributional parameter in parentheses):

geometric (p)
Yule (p)
Poisson (lambda)
logarithmic series (theta)
binomial (p, assume n known)
negative binomial (p, assume k known)
Beta-Binomial (alpha, beta, assume n known)
Hermite (alpha, beta)

The use of the PPCC plot for discrete distributions is still experimental (see the Note below).

The percent point function for the discrete distributions is a step function (since X is restricted to integer values). This can result in non-smooth ppcc and probability plots. For discrete distributions, the KS PLOT (which will plot the minimum value of chi-square statistic) is recommended over the PPCC PLOT as long as the sample size is reasonably large.

Syntax 1:

This syntax is used for the raw data case.

The syntax PPCC PLOT can be replaced with ANDERSON DARLING PLOT, KOLMOGOROV SMIRNOV PLOT, or CHI-SQUARE PLOT to generate the Anderson-Darling, Kolmogorov-Smirnov, and chi-square variants of the plot, respectively.

Syntax 2:

is optional.

This syntax is used for the binned data case where the bins are defined by the mid-points of each bin.

The syntax PPCC PLOT can be replaced with CHI-SQUARE PLOT to generate the chi-square variant of the plot. This syntax is not supported for the Anderson-Darling and Kolmogorov-Smirnov variants of the plot.

Syntax 3:

This syntax is used for the binned data case where the bins are defined by the lower and upper limits of the bins (i.e., the bins can be of unequal width).

Syntax 4:

This syntax is used for the raw data case where there is censoring.

Censoring is not supported for discrete disributions or grouped data. It is also not supported for the Anderson-Darling, Kolmogorov-Smirnov, and chi-square variants of the plot.

Syntax 5:

This syntax is used for the case where we have frequency (binned) data with censoring. The bins are defined by their mid-points. When a particular bin has both censored and uncensored data, there will be 2 rows with the same value for .

A value of 1 indicates a failure time and a value of 0 indicates a censoring time.

Censoring is not supported for discrete disributions or grouped data. It is also not supported for the Anderson-Darling, Kolmogorov-Smirnov, and chi-square variants of the plot.

Syntax 6:

This syntax is used for the case where we have frequency (binned) data with censoring. The bins are defined by their lower and upper limits. This syntax allows bins with unequal widths. When a particular bin has both censored and uncensored data, there will be 2 rows with the same values for <xlow> and <xhigh>.

A value of 1 indicates a failure time and a value of 0 indicates a censoring time for the censoring variable.

Censoring is not supported for discrete disributions or grouped data. It is also not supported for the Anderson-Darling, Kolmogorov-Smirnov, and chi-square variants of the plot.

Syntax 7:

The group-id variables are cross-tabulated and a ppcc plot will be generated for each distinct combination of values for the group-id variables. These plots will be overlaid on the same plot.

Syntax 8:

The group-id variables are cross-tabulated and a ppcc plot will be generated for each distinct combination of values for the group-id variables. These plots will be overlaid on the same plot.

Censoring is not supported for discrete disributions or grouped data. It is also not supported for the Anderson-Darling, Kolmogorov-Smirnov, and chi-square variants of the plot.

Syntax 9:

This syntax is used for the binned data case where there are multiple batches of data. The bins are defined by the mid-points of each bin and there are multiple batches of data.

Syntax 10:

This syntax is used for the binned data case where there are multiple batches of data. The bins are defined by the lower and upper limits of the bins (i.e., the bins can be of unequal width).

Syntax 11:

Note that the response variables can also be matrices. If a matrix name is encountered, a ppcc plot will be drawn for all the values in the matrix. For multiple response variables, the ppcc plots will be overlaid on the same plot.

Examples:

Note:

Currently, these alternatives are limited to the uncensored case. In addition, the KS PLOT and AD PLOT are restricted to the raw data case and the CHI-SQUARE PLOT is restricted to the binned data case.

Note that the PPCC method is invariant to location and scale. This basically means that we can use the underlying probability plot to estimate the location and scale parameters.

These other methods are not invariant to location and scale. By default, we still use the estimates from the underlying probability plot to estimate location and scale. Although these estimates may not be "optimal", they should at least be reasonable. However, you can fix the estimates of location and scale by entering the commands

These apply to the Kolmogorov-Smirnov, Anderson-Darling, and chi-square variants of the plot.

Note:

Note:

HELP HISTOGRAM CLASS WIDTH

This test is sensitive to the choice of bins. There is no optimal choice for the bin width (since the optimal bin width depends on the distribution). Most reasonable choices should produce similar, but not identical, results.

For the chi-square approximation to be valid, the expected frequency should be at least 5. The chi-square approximation may not be valid for small samples, and if some of the counts are less than five, you may need to combine some bins in the tails.

Note:

A common use of this is to obtain a refinement of the estimate of the shape parameter. That is, an initial iteration (typically just the default values of the parameter) is used to identify the appropriate neighborhood of the optimal value of the shape parameter. Then a second iteration of the PPCC PLOT is generated with the parameter restricted to a much narrower range of values. Although this iteration can be repeated as many times as you like, for practical purposes a two iterations is typically sufficient.

Note:

In the case of two shape parameters, these are saved as SHAPE1 and SHAPE2.

Note:

LET X0 = <value>

before generating the ppcc plot.

For the noncentral t and noncentral chi-square distributions, we can fix the value of the degrees of freedom parameter to a single value. In this case, the ppcc plot reverts to a one shape parameter plot. Enter the commands

where <value> is the same for NU1 and NU2.

Note:

Weibull
Frechet (extreme value type 2)
generalized extreme value

A value of 1 or MIN specifies the minimum form of the disribution and a value of 2 or MAX specifies the maximum form of the distribution.

Although earlier versions of Dataplot required that this parameter be explicitly entered, Dataplot will now choose a default form of the distribution if it has not been specified. For the Weibull, the minimum form is the default. For the Frechet and generalized extreme value disributions, the maximum form is the default. Note that if you enter an explicit SET MINMAX command, it applies to all 3 distributions.

Note:

You can bin the data before generating the PPCC plot.
As an alternative to binning, you can use the command
With this command, Dataplot will generate <value> equally spaced percentiles of the data. The PPCC plot is then generated on these percentiles.
If the number of data points in the response variable is less than <value> then the full data set is used.
The minimum number for <value> is 25. Numbers in the range 50 to 200 are typically used.

For distributions that have percent point functions that can be computed with simple closed form formulas or that have relatively simple approximations, there is little to be gained by thinning the data since the ppcc plot in these cases will still be quite fast even for very large data sets. However, there are a number of distributions where the percent point function is computed by numerically inverting a cumulative distribution function (which may in turn be computed via a numerical integration). In these cases, using one of the binning techniques can make the method practical (although you will likely not obtain as accurate an estimate as the full data set would produce).

Note:

You can modify the number values used for the shape parameters by entering the command

SET PPCC PLOT AXIS POINTS <val1> <val2>

where <val1> is the number of values for the first shape parameter and <val2> is the number of values for the second shape parameter.

There are two typical uses for this command:

For distributions with a fast percent point function (e.g., the Weibull), you can increase the number of values in order to generate a more accurate estimate. This is an alternative to performing two iterations of the ppcc plot. Again, for distibutions with relatively simple percent point functions, we can generate a fairly large number of points on the plot and still have quite good performance.
For distributions with slow percent point functions, you might want to decrease the number of points in order to increase the speed of the PPCC plot.

Note:

This will center the bins around the integer values and will cover the first and last class.

In this case, the KS PLOT syntax will generate a plot that shows the minimum value of the chi-square statistic. It is usually recommended that the minimum bin size be at least 5 in order for the chi-square goodness of fit to generate accurate critical values. You can automatically combine bins with the command

Although the ppcc plot can also accept the unequal bin width syntax, there is typically less reason to do this for the ppcc plot. The primary reason is you want to compare the ppcc plot with the chi-square plot and you want to have comparable bins for both methods. Also, some data sets may be provided in a format with unequal bin widths (this is usually to combine bins in the tails with few points).

Note:

SET CHI-SQUARE LIMIT <value>

Note:

Alternatively, you can specify that Dataplot fit a robust regression using the biweight method by entering the command

SET PPCC PLOT LOCATION SCALE BIWEIGHT

To reset the default of non-robust least squares, enter

SET PPCC PLOT LOCATION SCALE DEFAULT

In our experience, this option can be useful for heavy tailed distributiuons such as the SLASH and CAUCHY distributions.

Default:

None Synonyms:

FRECHET and EV2 are synonyms for EXTREME VALUE TYPE 2.

LAMBDA PPCC PLOT and TUKEY PPCC PLOT are synonyms for TUKEY LAMBDA PPCC PLOT.

STUDENT T PPCC PLOT is a synonym for T PPCC PLOT.

The CHISQUARE term can be specified as CHISQUARE or CHI SQUARE.

FL PPCC PLOT, BRIN SAUNDERS PPCC PLOT, and SAUNDERS BRIN are synonyms for FATIGUE LIFE PPCC PLOT.

IG PPCC PLOT is a synonym for INVERSE GAUSSIAN PPCC PLOT.

RIG PPCC PLOT is a synonym for RECIPROCAL INVERSE GAUSSIAN PPCC PLOT.

GEP PPCC PLOT and GP PPCC PLOT are synonyums for GENERALIZED PARETO PLOT.

LOGNORMAL PPCC PLOT and LOG-NORMAL PPCC PLOT are synonyms for LOG NORMAL PPCC PLOT.

POWER LOG-NORMAL PPCC PLOT and POWER LOGNORMAL PPCC PLOT are synonyms for POWER LOG NORMAL PPCC PLOT.

VONMISES PPCC PLOT and VON-MISES PPCC PLOT are synonyms for VON MISES PPCC PLOT.

LOGLOGISTIC PPCC PLOT and LOG-LOGISTIC PPCC PLOT are synonyms for LOG LOGISTIC PPCC PLOT.

SKEW LAPLACE PPCC PLOT is a synonym for SKEW DOUBLE EXPONENTIAL PPCC PLOT.

ASYMMETRIC LAPLACE PPCC PLOT is a synonym for ASYMMETRIC DOUBLE EXPONENTIAL PPCC PLOT.

Related Commands:

KS PLOT	= Generates a Kolmogorov-Smirnov goodness of fit plot.
PROBABILITY PLOT	= Generates a probability plot.
FREQUENCY PLOT	= Generates a frequency plot.
HISTOGRAM	= Generates a histogram.
PIE CHART	= Generates a pie chart.
PERCENT POINT PLOT	= Generates a percent point plot.
PLOT	= Generates a data or function plot.

Reference:

Technometrics

Applications:

Distributional Modeling Implementation Date:

POWER LOGNORMAL, POWER FUNCTION, CHI, VON MISES, and LOG LOGISTIC distributions

G AND H, INVERTED BETA distributions.

ASYMMETRIC DOUBLE, EXPONENTIAL, MAXWELL

BINOMIAL, MCLEISH, GENERALIZED MCLEISH

specified by the lower and upper limits (i.e., unequal width bins)

Program 1:

 
MULTIPLOT 2 2
MULTIPLOT CORNER COORDINATES 0 0 100 100
MULTIPLOT SCALE FACTOR 1.5
TITLE AUTOMATIC
X1LABEL THEORETICAL VALUE
Y1LABEL DATA VALUE
TITLE OFFSET 2
X1LABEL DISPLACEMENT 10
Y1LABEL DISPLACEMENT 14
CHAR X
LINE BLANK
JUSTIFICATION RIGHT
.
LET LAMBDA = 1.5
LET Y = TUKEY LAMBDA RANDOM NUMBERS FOR I = 1 1 100
TUKEY LAMBDA PPCC PLOT Y
MOVE 82 30
TEXT LAMBDA = ^SHAPE
MOVE 82 25
TEXT PPCC = ^MAXPPCC
.
LET NU = 4
LET Y = T RANDOM NUMBERS FOR I = 1 1 100
T PPCC PLOT Y
MOVE 82 30
TEXT NU = ^SHAPE
MOVE 82 25
TEXT PPCC = ^MAXPPCC
.
LET GAMMA = 2.3
LET Y = WALD RANDOM NUMBERS FOR I = 1 1 100
WALD PPCC PLOT Y
MOVE 82 30
TEXT GAMMA = ^SHAPE
MOVE 82 25
TEXT PPCC = ^MAXPPCC
.
LET GAMMA = 1.6
LET Y = WEIBULL RANDOM NUMBERS FOR I = 1 1 100
SET PPCC PLOT AXIS POINTS 200
LET GAMMA1 = 0.2
LET GAMMA2 = 25
LINE SOLID
CHARACTER BLANK
WEIBULL PPCC PLOT Y
MOVE 82 30
TEXT GAMMA = ^SHAPE
MOVE 82 25
TEXT PPCC = ^MAXPPCC
.
END OF MULTIPLOT

Program 2:

 
let gamma = 5.1
let y = weibull rand numb for i = 1 1 200
.
let gamma1 = 0.5
let gamma2 = 50
set ppcc plot axis points 449
.
multiplot corner coordinates 2 2 98 98
multiplot scale factor 2
multiplot 2 2
title automatic
title offset 2
justification center
height 1.7
tic mark offset units screen
ytic mark offset 3 0
.
weibull ppcc plot y
let shape = round(shape,1)
let maxppcc2 = round(maxppcc,3)
move 50 5
text Shape: ^shape, Max PPCC: ^maxppcc2
.
weibull anderson darling plot y
let shape = round(shape,1)
let minad2 = round(minad,3)
move 50 5
text Shape: ^shape, Min AD: ^minad2
.
weibull ks plot y
let shape = round(shape,1)
let minks = round(minks,3)
move 50 5
text Shape: ^shape, Min KS: ^minks
.
set chisquare limit 100
weibull chi-square plot y
let shape = round(shape,1)
let minchsq = round(minchisq,3)
move 50 5
text Shape: ^shape, Min Chi-Square: ^minchsq
.
end of multiplot