PROBABILITY PLOT

Name:

... PROBABILITY PLOT Type:

Graphics Command Purpose:

Generates a probability plot for one of 90+ distributions. Description:

The probability plot consists of:

Vertical axis	=	ordered observations;
Horizontal axis	=	percent point function of the order statistic medians.

This is essentially a plot of the data percentiles versus the percentiles of the theoretical distribution. Dataplot computes the percent point function of the uniform order statistic medians to compute the percentiles of the theoretical distribution.

DATAPLOT has extensive probability plot capabilities (90+ distributions/distributional families are available). When distributional families are specified, the LET command is used before the PROBABILITY PLOT command to specify which member of the distributional family is desired. For example,

The name of the distributional parameter for families is given in the list below.

Probability plots serve two primary uses.

Distributional Modeling
The slope and intercept of the line fit to the probability plot are estimates for the location and scale parameters of the distribution.
The following provides one possible approach to distributional modeling.
- If the distribution has one or two shape parameters, use the PPCC PLOT or KS PLOT to obtain estimates for the shape parameters (HELP PPCC PLOT or HELP KS PLOT for details).
- Once the shape parameters (if any) have been estimated, generate the probability plot to obtain estimates for the location and scale parameters.
- The bootstrap can be used to obtain confidence intervals for the distribution parameters and selected quantiles. Enter HELP DISTRIBUTIONAL BOOTSTRAP for details.
Goodness of Fit
The probability plot provides a graphical assessment of goodness of fit. The straighter the probability plot, the better the fit. One advantage of the graphical approach over quantitative measures (e.g., Kolmogorov-Smirnov test) is that it provides an indication of how the distribution is not a good fit. This can provide guidance to a better distributional model.
The correlation coefficient of the points on the probability plot provides a numerical measure of the straightness of the probability plot. Dataplot automatically saves this value in the internal parameter PPCC. The PPCC values provide a useful ranking measure when comparing different distributional models.

Syntax 1:

NORMAL
HALFNORMAL
SLASH
COSINE
LOGISTIC
HALF LOGISTIC
HYPERBOLIC SECANT
CAUCHY
HALF CAUCHY
DOUBLE EXPONENTIAL
EXPONENTIAL
EXTREME VALUE TYPE 1 (or GUMBEL)
UNIFORM
SEMI-CIRCULAR
ANGLIT
ARCSIN
RAYLEIGH
MAXWELL
WEIBULL (GAMMA)
DOUBLE WEIBULL (GAMMA)
INVERTED WEIBULL (GAMMA)
GAMMA (GAMMA)
LOG GAMMA (GAMMA)
DOUBLE GAMMA (GAMMA)
INVERTED GAMMA (GAMMA)
WALD (GAMMA)
FATIGUE LIFE (GAMMA)
EXTREME VALUE TYPE 2 (GAMMA)
GENERALIZED EXTREME VALUE (GAMMA)
PARETO (GAMMA)
)PARETO SECOND KIND (GAMMA)
GENERALIZED PARETO (GAMMA)
GENERALIZED HALF LOGISTIC (GAMMA)
TUKEY LAMBDA (LAMBDA)
SKEWED NORMAL (LAMBDA)
SKEW DOUBLE EXPONENTIAL (LAMBDA)
POISSON (LAMBDA)
T (NU)
FOLDED T (NU)
CHI-SQUARED (NU)
CHI (NU)
LOGNORMAL (SD)
LOG DOUBLE EXPONENTIAL (ALPHA)
ERROR (ALPHA)
GENERALIZED LOGISTIC (ALPHA)
WRAPPED CAUCHY (C)
POWER FUNCTION (C)
TRIANGULAR (C)
LOG LOGISTIC (DELTA)
VON-MISES (B)
DISCRETE UNIFORM (N)
GEOMETRIC (P)
YULE (P)
LOGARITHMIC SERIES (THETA)
RECIPROCAL (B)
BRADFORD (BETA)
ASYMMETRIC DOUBLE EXPO (K)
POWER-NORMAL (P, SD)
POWER-LOGNORMAL (P, SD)
FOLDED NORMAL (M, SD)
FOLDED CAUCHY (M, SD)
SKEWED T (LAMBDA, NU)
NONCENTRAL T (NU, LAMBDA)
NONCENTRAL CHISQUARE (NU, LAMBDA)
LOG SKEWED NORMAL (LAMBDA)
BETA (ALPHA, BETA)
INVERTED BETA (ALPHA, BETA)
BETA BINOMIAL (ALPHA, BETA)
HERMITE (ALPHA, BETA)
EXPONENTIAL POWER (ALPHA, BETA)
ALPHA (ALPHA, BETA)
G AND H (G, H)
JOHNSON SB (ALPHA1, ALPHA2)
JOHNSON SU (ALPHA1, ALPHA2)
EXPONENTIATED WEIBULL (GAMMA, THETA)
GENERALIZED GAMMA (GAMMA, C)
INVERSE GAUSSIAN (GAMMA, MU)
RECIPROCAL INVERSE GAUSSIAN (GAMMA, MU)
F (NU1, NU2)
TWO-SIDED POWER (THETA, N)
BINOMIAL (N, P)
GOMPERTZ (C, B)
GENERALIZED MCLEISH (ALPHA, A)
NEGATIVE BINOMIAL (K, P)
LOG SKEWED T (LAMBDA, NU, SD)
DOUBLY NONCENTRAL T (NU, LAMBDA1, LAMBDA2)
NONCENTRAL F (NU1, NU2, LAMBDA)
NONCENTRAL BETA (ALPHA, BETA, LAMBDA)
TRUNCATED EXPONENTIAL (X0, M, SD)
GENERALIZED EXPONENTIAL (LAMBDA1, LAMBDA2, S)
GOMPERTZ-MAKEHAM (XI, LAMBDA, THETA)
MIELKE BETA-KAPPA (BETA, THETA, K)
HYPERGEOMETRIC (K, N, M)
GENERALIZED INVERSE GAUSS (CHI, LAMBDA, THETA)
BESSEL I (SIGMA1SQ, SIGMA2SQ, NU)
(B, C, M)
DOUBLY NONCENTRAL F (NU1, NU2, LAMBDA1, LAMBDA2)
TRUNCATED NORMAL (A, B, M, SD)
TRAPEZOID (A, B, C, D)
NORMAL MIXTURE (MU1, SD1, MU2, SD2, P)
BI-WEIBULL (SCALE1, GAMMA1, LOC2, SCALE2, GAMMA2)
GENERALZIED TRAPEZOID (A, B, C, D, NU1, NU3, ALPHA)

and where the is optional.

This syntax is used for the case where we have raw data.

Syntax 2:

This syntax is used for the case where we have censored data. A value of 1 indicates a failure time and a value of 0 indicates a censoring time.

Censoring is not supported for discrete distributions or grouped data.

Syntax 3:

This syntax is used for the case where we have frequency (binned) data. The bins are defined by their mid-points.

Syntax 4:

This syntax is used for the case where we have frequency (binned) data. The bins are defined by their lower and upper limits. This syntax allows bins with unequal widths.

Examples:

Note:

PPCC - the correlation coeffcient of the points on the probability plot
PPA0 - the intercept of the line fitted to the probability plot (estimate of the location parameter)
PPA1 - the slope of the line fitted to the probability plot (estimate of the scale parameter)
SDPPA0 - the standard deviation of PPA0
SDPPA1 - the standard deviation of PPA1
PPRESSD - the residual standard deviation of the line fitted to the probability plot
PPRESDF - the residual degrees of freedom of the line fitted to the probability plot
PPA0BW - the intercept of the line fitted to the probability plot with biweight weighting of the residuals
PPA1BW - the slope of the line fitted to the probability plot with biweight weighting of the residuals

The PPCC value provides a measure of the linearity of the probability plot.

The PPA0 and PPA1 provides estimates of the location and scale parameters.

For some distributions with heavy tails (e.g., Cauchy, slash), there can be extreme variability in the first few and last few points in the probability plot. This can distort the estimates of location and scale. Two iterations of biweight weighting of the residuals are applied to obtain PPA0BW and PPA1BW. In most cases, using PPA0 and PPA1 are preferred. However, in cases where there is extreme non-linearity in the tails, using PPA0BW and PPA1BW may be preferred as the location and scale estimates.

Note:

For singly censored data (i.e., all the censored data have the same censoring time), we can use the N from the full sample to compute the uniform order statistics. However, we only plot the failure times.

An alternative that works with both singly and multiply (the censoring times are not necessarily the same) is to base the plotting positions on the Kaplan-Meier statistic. That is,

\( p_{i} = \frac{n + 0.7}{n + 0.4} \prod_{k=1}^{i}{\frac{n - k + 0.7}{n - k + 1.7}} \)

with n denoting the full sample size. Again, only plotting positiions corresponding to failure times are plotted. The percent point function is computed on the p_i values.

This method for censored probability plots is discussed in more detail on pp. 43-46 of the Bury book (see the References section below).

To specify which method to use, enter the command

<KAPLAN-MEIER/UNIFORM ORDER STATISTIC MEDIANS>

Note:

SET PROBABILITY PLOT DATA POINTS <value>

When this command is entered, Dataplot will compute <value> equally spaced percentiles and compute the probability plot on these percentiles. This option can be useful when generating probability plots on large data sets for distributions with expensive percent point functions.

Note:

This will center the bins around the integer values and will cover the first and last class.

Default:

None Synonyms:

Related Commands:

FREQUENCY PLOT	= Generates a frequency plot.
HISTOGRAM	= Generates a histogram.
PIE CHART	= Generates a pie chart.
PERCENT POINT PLOT	= Generates a percent point plot.
PPCC PLOT	= Generates probability plot correlation coefficient plot.
PLOT	= Generates a data or function plot.

References:

Technometrics

Chambers, Cleveland, Kleiner, and Tukey (1983), "Graphical Methods of Data Analysis", Wadsworth.

Karl Bury (1999), "Statistical Distributions in Engineering", Cambridge University Press,

Applications:

Distributional Analysis Implementation Date:

WRAPPED UP CAUCHY, EXPONENTIATED WEIBULL, TRUNCATED EXPONENTIAL GENERALIZED LOGISTIC, EXPONENTIAL POWER

JOHNSON SB, INVERTED WEIBULL, LOG DOUBLE EXPONENTIAL

SKEWED T, SKEWED NORMAL, SLASH, INVERTED BETA, G AND H

biweight fit (PPA0BW and PPA1BW)

ASYMMETRIC DOUBLE EXPONENTIAL, MAXWELL,

GENERALIZED ASYMMETRIC LAPLACE, GENERALIZED INVERSE GAUSSIAN

BESSEL K FUNCTION

Program:

    MULTIPLOT 2 2
    MULTIPLOT CORNER COORDINATES 0 0 100 100
    MULTIPLOT SCALE FACTOR 1.5
    TITLE AUTOMATIC
    X1LABEL THEORETICAL VALUE
    Y1LABEL DATA VALUE
    TITLE OFFSET 2
    X1LABEL DISPLACEMENT 10
    Y1LABEL DISPLACEMENT 14
    CHAR X
    LINE BLANK
    .
    LET Y = NORMAL RANDOM NUMBERS FOR I = 1 1 100
    NORMAL PROBABILITY PLOT Y
    .
    LET NU = 5
    LET Y = CHI-SQUARE RANDOM NUMBERS FOR I = 1 1 100
    CHI-SQUARE PROBABILITY PLOT Y
    .
    LET Y = EXPONENTIAL RANDOM NUMBERS FOR I = 1 1 100
    EXPONENTIAL PROBABILITY PLOT Y
    .
    LET Y = CAUCHY RANDOM NUMBERS FOR I = 1 1 1000
    CAUCHY PROBABILITY PLOT Y
    END OF MULTIPLOT