 Dataplot Vol 1 Vol 2

# DISTRIBUTIONAL BOOTSTRAP

Name:
BOOTSTRAP PLOT
Type:
Graphics Command
Purpose:
Generates a bootstrap plot for a given probability distribution.
Description:
The PPCC PLOT and KS PLOT provide a graphical method for estimating the shape parameter for a proabability distribution. The PROBABILITY PLOT can then be used to estimate the location and scale parameters.

One limitation of this method is that it does not provide a method for finding uncertainty intervals for these estimates. To address this, we have extended the BOOTSTRAP PLOT command to support a number of probability distributions.

The bootstrap is a non-parametric method for calculating a sampling distribution for a statistic. The bootstrap calculates the statistic with N different subsamples. The subsampling is performed with replacement.

To apply the bootstrap to the univariate distributional modeling problem, we do the following:

1. We have a univariate dataset containing n points.

2. We draw a bootstrap sample from the original data.

3. We perform the estimation for the bootstrap sample.

• For location/scale distributions, we estimate the parameters from a probability plot.

The PPCC value is also computed for the bootstrap sample.

• For distributions with either one or two shape parameters, we generate a PPCC plot (alternatively, we can generate a KS plot) to estimate the shape parameters. We then generate a probability plot to estimate the location and scale parameters.

The PPCC value is also computed for the bootstrap sample. If a KS plot is being used, the value of Kolmogorov-Smirnov statistic is computed instead.

• Bootstrapping is also supported for maximum likelihood estimation for a number of distributions. In this case, we perform the maximum likelihood estimation on the bootstrap sample.

In this case, no PPCC or Kolmogorov-Smirnov statistic is computed.

For the bootstrap plot, there will be a separate curve drawn for each parameter estimated. In addition, there will be a curve drawn for the PPCC value (or the Kolmogorov-Smirnov statistic). The vertical axis contains the computed value of these estimated parameters and the horizontal axis contains the sample number (for k = 1, 2, ..., N).

The bootstrap plot is typically followed by some type of distributional plot, such as a histogram, for each estimated parameter. This is demostrated in the Program sample below.

Dataplot also supports bootstrap computations for the case when there is one group variable. In this case, the horizontal axis is group id and the vertical axis contains the computed values of the estimated parameters for that group (the parameters are offset horizontally). The number of bootstrap samples is applied to each group. For example,if the requested number of bootstrap samples is 100, then each group will have 100 bootstrap samples applied.

Syntax 1:
BOOTSTRAP <dist> PLOT <y>       <SUBSET/EXCEPT/FOR qualification>
where <y> is the response variable;
<dist> is one of the following distributions:
ANGLIT
ARCSINE
CAUCHY
COSINE
EXPONENTIAL
GUMBEL (EXTREME VALUE TYPE 1)
HALF CAUCHY
HALF LOGISTIC
HALF NORMAL
HYPERBOLIC SECANT
LAPLACE (DOUBLE EXPONENTIAL)
LOGISTIC
NORMAL
RAYLEIGH
SEMI-CIRCULAR
SLASH
UNIFORM
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

This syntax estimates the location and scale parameters for each bootstrap sample using a probability plot.

Syntax 2:
BOOTSTRAP <dist> CENSORED PLOT <y> <x>
<SUBSET/EXCEPT/FOR qualification>
where <y> is the response variable;
<x> is the censoring variable;
<dist> is one of the distributions given for Synatx 1;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

This syntax estimates the location and scale parameters for each bootstrap sample using a censored probability plot. The censoring variable should contain a value of 1 to indicate a failure time and a value of 0 to indicate a censoring time.

Syntax 3:
BOOTSTRAP <dist> PLOT <y>       <SUBSET/EXCEPT/FOR qualification>
where <y> is the response variable;
<dist> is one of the following distributions:
ASYMMETRIC LAPLACE (ASYMMETRIC DOUBLE EXPONENTIAL)
CHI
CHI-SQUARE
DOUBLE GAMMA
DOUBLE WEIBULL
ERROR (SUBBOTIN)
FATIGUE LIFE
FOLDED T
FRECHET
GAMMA
GENERALIZED EXTREME VALUE
GENERALIZED HALF LOGISTIC
GENERALIZED LOGISTIC
GENERALIZED PARETO
GEOMETRIC EXTREME EXPONENTIAL
INVERTED GAMMA
INVERTED WEIBULL
LOG DOUBLE EXPONENTIAL (LOG LAPLACE)
LOG GAMMA
LOG LOGISTIC
LOGNORMAL
PARETO
PARETO SECOND KIND
POWER
POWER NORMAL
RECIPROCAL
SKEW LAPLACE (SKEW DOUBLE EXPONENTIAL)
T
TUKEY-LAMBDA
VON MISES
WALD
WRAPPED CAUCHY
WEIBULL
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

This syntax estimates the shape parameter using a PPCC plot and then estimates the location and scale parameters using a probability plot for each bootstrap sample .

Syntax 4:
BOOTSTRAP <dist> CENSORED PLOT <y> <x>
<SUBSET/EXCEPT/FOR qualification>
where <y> is the response variable;
<x> is the censoring variable;
<dist> is one of the distributions given for syntax 3;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

This syntax estimates the shape parameter using a censored PPCC plot and then estimates the location and scale parameters using a censored probability plot for each bootstrap sample. The censoring variable should contain a value of 1 to indicate a failure time and a value of 0 to indicate a censoring time.

Syntax 5:
BOOTSTRAP <dist> KS PLOT <y>    <SUBSET/EXCEPT/FOR qualification>
where <y> is the response variable;
<dist> is one of the distributions given for syntax 3;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

This syntax estimates the shape parameter using a KS plot and then estimates the location and scale parameters using a probability plot for each bootstrap sample.

Syntax 6:
BOOTSTRAP <dist> PLOT <y>       <SUBSET/EXCEPT/FOR qualification>
where <y> is the response variable;
<dist> is one of the following distributions:
BETA
F
FOLDED NORMAL
G-AND-H
INVERSE GAUSSIAN
GENERALIZED GAMMA
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

This syntax estimates the two shape parameters using a PPCC plot and then estimates the location and scale parameters using a probability plot for each bootstrap sample.

Syntax 7:
BOOTSTRAP <dist> CENSORED PLOT <y> <x>
<SUBSET/EXCEPT/FOR qualification>
where <y> is the response variable;
<x> is the censoring variable;
<dist> is one of the distributions given in syntax 6;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

This syntax estimates the two shape parameters using a censored PPCC plot and then estimates the location and scale parameters using a censored probability plot for each bootstrap sample. The censoring variable should contain a value of 1 to indicate a failure time and a value of 0 to indicate a censoring time.

Syntax 8:
BOOTSTRAP <dist> MLE PLOT <y>   <SUBSET/EXCEPT/FOR qualification>
where <y> is the response variable;
<dist> is one of the following distributions:
CAUCHY
EXPONENTIAL
FOLDED NORMAL
GUMBEL (EXTREME VALUE TYPE 1)
LAPLACE (DOUBLE EXPONENTIAL)
LOGISTIC
NORMAL
RAYLEIGH
UNIFORM
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

This syntax estimates the location and scale parameters using maximum likelihood for each bootstrap sample.

Syntax 9:
BOOTSTRAP <dist> CENSORED MLE PLOT <y> <x>
<SUBSET/EXCEPT/FOR qualification>
where <y> is the response variable;
<x> is the censoring variable;
<dist> is one of the following distributions:
NORMAL
EXPONENTIAL
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

This syntax estimates the location and scale parameters using maximum likelihood for each bootstrap sample. The censoring variable should contain a value of 1 to indicate a failure time and a value of 0 to indicate a censoring time.

Syntax 10:
BOOTSTRAP <dist> MLE PLOT <y>    <SUBSET/EXCEPT/FOR qualification>
where <y> is the response variable;
<dist> is one of the following distributions:
BETA
FATIGUE LIFE
GAMMA
GENERALIZED PARETO
GEOMETRIC EXTREME EXPONENTIAL
INVERSE GAUSSIAN
LOGNORMAL
PARETO
WEIBULL
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

This syntax estimates the shape and scale parameters (the Beta, Pareto, and inverse Gaussian estimate the two shape parameters but no scale parameter) using maximum likelihood for each bootstrap sample.

Syntax 11:
BOOTSTRAP <dist> CENSORED MLE PLOT <y> <x>
<SUBSET/EXCEPT/FOR qualification>
where <y> is the response variable;
<x> is the censoring variable;
<dist> is one of the following distributions:
CENSORED GAMMA
CENSORED LOGNORMAL
CENSORED WEIBULL
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

This syntax estimates the location and scale parameters using maximum likelihood for each bootstrap sample. The censoring variable should contain a value of 1 to indicate a failure time and a value of 0 to indicate a censoring time.

Syntax 12:
BOOTSTRAP <dist> MLE PLOT <y>   <SUBSET/EXCEPT/FOR qualification>
where <y> is the response variable;
<dist> is one of the following distributions:
ASYMMETRIC LAPLACE
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

This syntax estimates the shape, location, and scale parameters using maximum likelihood for each bootstrap sample.

Examples:
BOOTSTRAP NORMAL PLOT Y
BOOTSTRAP NORMAL MLE PLOT Y
BOOTSTRAP WEIBULL PLOT Y
BOOTSTRAP WEIBULL KS PLOT Y
BOOTSTRAP WEIBULL CENSORED PLOT Y X
Note:
The BOOTSTRAP PLOT command generates the estimates for the bootstrap samples. Typically, these values are processed further for a complete bootstrap analysis. To simplify this, Dataplot writes information to files.

dpst1f.dat

The estimates for each bootstrap sample are written to file dpst1f.dat. You can read the variables written to dpst1f.dat to generate histograms and to compute selected percentiles.

The order is:

group id
location parameter
scale parameter
first shape parameter
second shape parameter
ppcc (or ks) value

If a particular syntax does not generate one or more of these values, then they are omitted (e.g., the normal distribution does not generate estimates for any shape parameters).

The following example generates a Weibull bootstrap plot and then reads the bootstrap estimates from dpst1f.dat.

WEIBULL BOOTSTRAP PLOT Y
SKIP 0
READ DPST1F.DAT ALOC ASCALE AGAMMA APPCC
MULTIPLOT 2 2
RELATIVE HISTOGRAM ALOC
RELATIVE HISTOGRAM ASCALE
RELATIVE HISTOGRAM AGAMMA
RELATIVE HISTOGRAM APPCC
END OF MULTIPLOT

dpst2f.dat

Selected percentiles are written to dpst2f.dat. The order is:

Group id - omitted for ungrouped data
Parameter number - parameters are ordered as they are in dpst1f.dat
Mean
Standard Deviation
Median
2.5 percentile
97.5 percentile
5.0 percentile
95.0 percentile
0.5 percentile
99.5 percentile
Note:
You can optionally specify that bootstrap estimates of selected percentiles be generated. This option is off by default.

If you enter the command

SET MAXIMUM LIKELIHOOD PERCENTILES DEFAULT

you will get bootstrap estimates for the following percentiles:

0.5, 1.0, 2.5, 5.0, 10.0, 20.0, 30.0, 40.0, 50.0, 60.0, 70.0, 80.0, 90.0, 95.0, 97.5, 99.0, 99.5

If you would like to specify the specific percentiles to estimate, enter the command

SET MAXIMUM LIKELIHOOD PERCENTILES YPERC

with YPERC denoting a variable that contains the desired percentiles.

This is demonstrated in the sample program below.

By default, two sided confidence intervals are generated for the percentiles. The following commands can be used to generate either lower one sided or upper one sided intervals

SET BOOTSTRAP DISTRIBUTIONAL PERCENTILES LOWER
SET BOOTSTRAP DISTRIBUTIONAL PERCENTILES UPPER

To turn off the computation of the percentile confidence intervals, enter

SET BOOTSTRAP DISTRIBUTIONAL PERCENTILES NONE

To reset the default of two sided intervals, enter

SET BOOTSTRAP DISTRIBUTIONAL PERCENTILES TWOSIDED
Note:
As in the PPCC and KS plots, you can specify the range for shape parameter. For example, to restrict the estimate of the shape parameter of a Weibull distribution to values between 0.5 and 10, enter the commands

LET GAMMA1 = 0.5
LET GAMMA2 = 10
BOOTSTRAP WEIBULL PLOT Y

One recommendation is to generate the bootstrap plot for a relatively small number of samples (e.g., 50) and use that to determine a reasonable range for the shape parameter.

Enter HELP PPCC PLOT to see the relevant parameter for the desired distribution.

Note:
The BOOTSTRAP PLOT supports ungrouped data, one group variable, or two group variables. The distributional bootstrap plots do not currently support two group variables.
Note:
Dataplot supports the BCA BOOTSTRAP PLOT option to generate more accurate confidence intervals. BCa is an abbreviation for "acceleration" and "bias-correction". It provides second order accuracy (as oppossed to the first order accurary of the confidence intervals generated for the percentiles of the bootstrap samples).

The BCA option is not currently supported for the bootstrap distributional plots.

Note:
Some analysts prefer a different method for generating the bootstrap samples.

The full sample is used to generate the parameter estimates of the distribution. Then the bootstrap samples are generated by generating random numbers from the specified distribution using the parameters estimated from the full sample.

The following command will specify this alternate method of bootstrapping be used:

SET DISTRIBUTIONAL BOOTSTRAP PARAMETRIC

To restore the default method of bootstrapping the data values, enter the command

SET DISTRIBUTIONAL BOOTSTRAP NONPARAMETRIC

The parameteric option is still being tested and may not work correctly for some of the distributions.

Note:
The BOOTSTRAP SAMPLE command can be used to specify the desired number of bootstrap samples. Recommended values are between 100 and 1,000. The default is 100.
Default:
None
Synonyms:
KOLMOGOROV SMIRNOV is a synonym for KS
MAXIMUM LIKELIHOOD is a synonym for MLE
Related Commands:
 BOOTSTRAP PLOT = Generate a bootstrap plot. PPCC PLOT = Generate a ppcc plot. KS PLOT = Generate a ks plot. PROBABILITY PLOT = Generate a probability plot. JACKNIFE PLOT = Generate a jacknife plot. BOOTSTRAP SAMPLE = Set the sample size for the bootstrap BOOTSTRAP FIT = Compute a bootstrap linear/multilinear fit. HISTOGRAM = Generates a histogram. PLOT = Generates a data/function plot.
Reference:
Efron and Gong (1983), "A Leisurely Look at the Bootstrap, the Jacknife, and Cross-Validation", The American Statistician, February, 1983.

Efron and Tibshirabi (1993), "An Introduction to the Bootstrap", Springer-Verlang.

Applications:
Distributional Modeling
Implementation Date:
2005/4:
Program:
```
.  Following Sample Macro demonstrates the use of the
.  bootstrap with a Weibull distribution.
.
.  Step 0: Create some sample Weibull data
.
dimension 50 columns
.
let gamma = 2.3
let y = weibull random numbers for i = 1 1 100
.
. Step 1: Perform PPCC/Probability Plot Analysis,
.         Perform K-S goodness of fit test
.
set ipl1na distboo1.ps
device 2 postscript
device 2 color on
.
multiplot 2 2
multiplot corner coordinates 0 0 100 100
multiplot scale factor 1.5
y1label displacement 12
.
title displacement 2
x1label displacement 12
title Weibull PPCC Plot
x1label Shape Parameter (gamma)
y1label Correlation
weibull ppcc plot y
justification left
height 3.5
move 25 28
text Max PPCC: ^maxppcc
move 25 21
text Shape: ^shape
let gamma1 = shape - 2
if gamma1 <= 0
let gamma1 = 0.1
end of if
let gamma2 = shape + 2
title Weibull PPCC Plot
weibull ppcc plot y
move 25 28
text Max PPCC: ^maxppcc
move 25 23
text Shape: ^shape
let gamma = shape
if n <= 200
character x
line blank
else
line solid
character blank
end of if
.
title Weibull Probability Plot
x1label displacement
x1label Theoretical
y1label Data
weibull probability plot y
justification center
move 50 2
text Location:  ^ppa0, Scale:  ^ppa1
let iplot = 3
multiplot 2 2 iplot
line solid
character blank
limits freeze
pre-erase off
let function f = ppa0 + ppa1*x
let zmin = minimum xplot
let zmax = maximum xplot
let ainc2 = (zmax - zmin)/10
plot f for x = zmin ainc2 zmax
limits
pre-erase on
let iplot = iplot + 1
multiplot 2 2 iplot
title Histogram with Overlaid Weibull
x1label Data Units
y1label Density
relative histogram y
multiplot 2 2 iplot
limits freeze
pre-erase off
let loc2=ppa0
let scale2=ppa1
line solid
char blank
line color blue
let amin = minimum y
let amax = maximum y
let ainc = 0.1
plot weipdf(x,shape,loc2,scale2) for x = amin ainc amax
limits
pre-erase on
line color black
delete gamma gamma1 gamma2
end of multiplot
device 2 close
.
. Step 2: Now perform a bootstrap analysis
.
feedback off
capture distboot.out
write " "
write "BOOTSTRAP-BASED INTERVALS"
write " "
set maximum likelihood percentiles default
bootstrap samples 200
let gamma1 = 0.2
let gamma2 = 5
set ipl1na distboo2.ps
device 2 postscript
device 2 color on
multiplot 2 3
multiplot corner coordinates 0 0 100 100
multiplot scale factor 1.7
y1label Parameter Estimate
x1label
x2label Bootstrap Sample
title Bootstrap Plot
line color blue red green
limits
bootstrap weibull plot y
line color black all
.
delete aloc ascale ashape appcc
skip 0
read dpst1f.dat aloc ascale ashape appcc
y1label
x2label
x3label displacement 16
title Location Parameter
let amed = median aloc
let amean = mean aloc
let asd = sd aloc
x2label Med = ^amed, Mean = ^amean
x3label SD = ^asd
histogram aloc
title Scale Parameter
let amed = median ascale
let amean = mean ascale
let asd = sd ascale
x2label Med = ^amed, Mean = ^amean
x3label SD = ^asd
histogram ascale
title Shape Parameter
let amed = median ashape
let amean = mean ashape
let asd = sd ashape
x2label Med = ^amed, Mean = ^amean
x3label SD = ^asd
histogram ashape
title PPCC Value
let amed = median appcc
let amean = mean appcc
let asd = sd appcc
x2label Med = ^amed, Mean = ^amean
x3label SD = ^asd
histogram appcc
x3label displacement
.
device 2 close
.
let alpha = 0.05
let xqlow = alpha/2
let xqupp = 1 - alpha/2
.
write "Bootstrap-based Confidence Intervals"
write "alpha = ^alpha"
write " "
.
let xq = xqlow
let loc95low = xq quantile aloc
let xq = xqupp
let loc95upp = xq quantile aloc
let xq = xqlow
let sca95low = xq quantile ascale
let xq = xqupp
let sca95upp = xq quantile ascale
let xq = xqlow
let sha95low = xq quantile ashape
let xq = xqupp
let sha95upp = xq quantile ashape
write "Confidence Interval for Location: (^loc95low,^loc95upp)"
write "Confidence Interval for Scale:    (^sca95low,^sca95upp)"
write "Confidence Interval for Gamma:    (^sha95low,^sha95upp)"
.
.  Now generate confidence intervals for percentiles
.
0.5 1 2.5 5 10 20 30 40 50 60 70 80 90 95 97.5 99 99.5
end of data
let nperc = size p
skip 1
write " "
loop for k = 1 1 nperc
let xqptemp = p(k)
let amed = median xqp^k
let xqpmed(k) = amed
let xq = xqlow
let atemp = xq quantile xqp^k
let xq95low(k) = atemp
let xq = xqupp
let atemp = xq quantile xqp^k
let xq95upp(k) = atemp
end of loop
set table title "Bootstrap Based Confidence Intervals for Percentiles"
set table spacing 15
set write decimals 7
write " "
write "Confidence Intervals for Percentiles"
write p xqpmed xq95low xq95upp
end of capture
delete xqp xqpmed xq95low xq95upp
delete gamma1 gamma2
```
``` BOOTSTRAP-BASED INTERVALS

Bootstrap-based Confidence Intervals
alpha = 0.05

Confidence Interval for Location: (-0.4112,0.272742)
Confidence Interval for Scale:    (0.744203,1.572763)
Confidence Interval for Gamma:    (1.738776,3.748062)

Confidence Intervals for Percentiles

VARIABLES--P              XQPMED         XQ95LOW        XQ95UPP

0.5000000      0.1427522     -0.0264724      0.3189068
1.0000000      0.1783569      0.0333498      0.3433715
2.5000000      0.2441821      0.1315464      0.3925191
5.0000000      0.3210546      0.2229533      0.4490436
10.0000000      0.4301926      0.3367559      0.5266151
20.0000000      0.5761956      0.4959525      0.6575613
30.0000000      0.6924722      0.6142470      0.7801363
40.0000000      0.8017390      0.7214258      0.8862920
50.0000000      0.9073263      0.8198442      0.9952904
60.0000000      1.0129241      0.9191304      1.1054595
70.0000000      1.1306920      1.0393872      1.2280047
80.0000000      1.2808006      1.1713773      1.3832619
90.0000000      1.4895294      1.3569634      1.5912455
95.0000000      1.6622919      1.5139780      1.7756084
97.5000000      1.8108935      1.6356139      1.9443940
99.0000000      1.9902619      1.7803770      2.1564791
99.5000000      2.1085887      1.8710965      2.3045762
```

NIST is an agency of the U.S. Commerce Department.

Date created: 4/20/2005
Last updated: 10/13/2015