 Dataplot Vol 1 Vol 2

# BOOTSTRAP FIT

Name:
BOOTSTRAP FIT
Type:
Graphics Command
Purpose:
Performs a bootstrap linear or multilinear fit.
Description:
The standard solution for linear/nultilinear fitting is to use ordinary least squares (OLS). OLS is based on several assumptions:

1. The residuals from the fit have constant location and variance.

2. The residuals from the fit are independent.

3. The residuals from the fit follow a common distribution, usually assumed to be the normal distribution.

When these assumptions are at least approximately satisfied, OLS provides the optimal estimates and uncertainty intervals for the fit coefficients. However, if the assumptions are not at least approximately satisfied, then the OLS estimates may no longer be optimal (and may in fact be quite wrong). Applying transformations and weighting are common approaches to fitting when the assumptions are not satisfied.

Bootstrap fitting provides an additional alternative. The bootstrap is a non-parametric method for calculating a sampling distribution for a statistic. The bootstrap calculates the statistic with N different subsamples. The subsampling is performed with replacement. In the context of fitting, we are estimating the coefficients of the fit and providing bootstrap estimates of the uncertainty.

There are two approaches to bootstrapping for fitting.

1. In the first approach, the OLS fit is computed from the original data. The residuals are then resampled. The residuals are then added to the predicted values of the original fit to obtain a new Y vector. This new Y vector is then fit against the original X variables. We call this approach residual resampling (or the Efron approach).

2. In the second approach, rows of the original data (both the Y vector and the corresponding rows of the X variables) are resampled. The resampled data are then fit. We call this approach data resampling (or the Wu approach).

Hamilton (see Reference below) gives some guidance on the contrasts between these approaches.

1. Residual resampling assumes fixed X values and independent and identically distributed residuals (although the residuals are not assumed to be normally distributed).

2. Data resampling does not assume independent and identically distributed residuals.

Given the above, if the assumption of fixed X is realistic (that is, we could readily collect new Y's with the same X values), then residual resampling is justified. For example, this would be the case in a designed experiment. However, if this assumption is not realistic (i.e., the X values vary randomly as well as the Y's), then data resampling is preferred.

If the bootstrap methods produce substantially different results, this is an indication that the assumptions of fixed X and independent and identically distributed residuals may not be valid.

The BOOTSTRAP FIT command produces the following output:

1. A few lines listing the sample size, the number of bootstrap samples, and the bootstrap method used (residual resampling or data resampling).

2. A summary table where each row of the table corresponds to one of fit coefficients. The first column identifies the parameter. Columns 2 and 3 list the coeficient estimate and standard deviation from the original fit. Columns 4 through 7 list the mean, the standard deviation, and the 2.5 and 97.5 percentiles of the bootstrap sample.

3. Typically, the analyst will want to further viewing and analysis of the bootstrap samples. For example, a histogram of the bootstrap samples are often displayed.

Dataplot writes the bootstrap samples to file. You can read these back into Dataplot to generate additional graphing and analysis of the bootstrap samples.

Specifically,

• The bootstrap samples for the coefficient estimates are written to the file "dpst1f.dat".

• The bootstrap samples for the coefficient standard deviation estimates are written to the file "dpst2f.dat".

• The bootstrap samples for the residual standard deviation estimates are written to the file "dpst3f.dat".

The program example below demonstrates the use of these files after the BOOTSTRAP FIT command.

Syntax 1:
BOOTSTRAP FIT <y> <x1> ... <xk>               <SUBSET/EXCEPT/FOR qualification>
where <y> is the response (dependent) variable;
<x1> .... <xk> is a list of one or more independent variables;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.
Examples:
BOOTSTRAP FIT Y X
BOOTSTRAP Y X1 X2 X3 X4
BOOTSTRAP Y X1 X2 X3 X4 SUBSET TAG > 1
Note:
Use the following command to specify which bootstrap method is used:

SET BOOTSTRAP FIT METHOD <RESIDUAL/DATA>

The default is RESIDUAL. You can use EFRON as a synonym for RESIDUAL and WU as a synonym for DATA.

Note:
Linear and quadratic calibration are special applications of fitting. The following commands can be used to perform a bootstrap analysis of linear and quadratic calibration, respectively:

LET Y0 = <value>
BOOTSTRAP LINEAR CALIBRATION PLOT

The BOOTSTRAP FIT METHOD command also applies to theses commands.

Note:
The number of bootstrap samples can be specified with the command:

BOOTSTRAP SAMPLES <value>

The default is 100 bootstrap samples. Some sources recommend as many as 2,000 bootstrap samples for accurate confidence intervals for the parameter estimates.

Note:
Confidence intervals for the coefficients of the fit model can be obtained from the appropriate percentiles of the bootstrap samples (e.g., the 2.5% and 97.5% percentiles provided in the summary table). However, in some cases these percentiles may have less than the nominal coverage probability.

Some refinements to generate more accurate confidence intervals have been proposed. The issue of bootstrap confidence intervals for multilinear fitting is discussed on pages 319-325 of Hamilton.

Default:
Residual resampling is used with 100 bootstrap samples.
Synonyms:
None
Related Commands:
 BOOTSTRAP SAMPLES = Specify the number of bootstrap samples to generate. BOOTSTRAP METHOD = Specify whether residual or data resampling is used. BOOTSTRAP PLOT = Generate a bootstrap plot. JACKNIFE PLOT = Generate a jacknife plot. HISTOGRAM = Generates a histogram. KERNEL DENSITY PLOT = Generates a kernel density plot. PLOT = Generates a data/function plot.
Reference:
Efron and Gong, 1983. "A Leisurely Look at the Bootstrap, the Jacknife, and Cross-Validation," The American Statistician.

Hamilton (1992), "Regression with Graphics: A Second Course in Applied Statistics," Duxbury Press,

Applications:
Fitting
Implementation Date:
2002/7
Program:
```
skip 25
.
set write decimals 5
set bootstrap fit method data
bootstrap samples 100
bootstrap fit y x
.
delete a0 a1
skip 0
.
multiplot corner coordinates 0 0 100 100
multiplot scale factor 2
multiplot 2 2
.
title a0 estimate
let bmean = mean a0
let b025 = 2.5 percentile a0
let b975 = 97.5 percentile a0
let bmean = int(bmean*1000)/1000
let b025 = int(b025*1000)/1000
let b975 = int(b975*1000)/1000
x2label mean = ^bmean, b025 = ^b025, b975 = ^b975
histogram a0
title a1 estimate
let bmean = mean a1
let b025 = 2.5 percentile a1
let b975 = 97.5 percentile a1
let bmean = int(bmean*1000)/1000
let b025 = int(b025*1000)/1000
let b975 = int(b975*1000)/1000
x2label mean = ^bmean, b025 = ^b025, b975 = ^b975
histogram a1
title a0 standard deviation
let bmean = mean a0sd
let bmean = int(bmean*1000)/1000
x2label mean = ^bmean
histogram a0sd
title a1 standard deviation
let bmean = mean a1sd
let bmean = int(bmean*1000)/1000
x2label mean = ^bmean
histogram a1sd
.
end of multiplot
```
The BOOTSTRAP FIT command generates the following output:
```            Bootstrap Linear/Multilinear Fit

Number of Observations:                             107
Number of Bootstrap Sample                          100
Bootstrap Method: Data (Wu)

Summary Table

--------------------------------------------------------------------------------------------------
Para- Estimates From Original Fit                  Estimates From Bootstrap Fit
meter           Coef             SD           Mean             SD            2.5           97.5
--------------------------------------------------------------------------------------------------
A0        4.99367        1.12565        4.87176        1.13198        2.77080        6.68491
A1        0.73111        0.02455        0.73568        0.02478        0.68731        0.80165
```
Histograms can be generated for the bootstrap samples of each of the parameter estimates and the standard deviations of the parameter estimates.

NIST is an agency of the U.S. Commerce Department.

Date created: 08/12/2002
Last updated: 12/15/2013