BOOTSTRAP FIT

Name:

BOOTSTRAP FIT Type:

Graphics Command Purpose:

Performs a bootstrap linear or multilinear fit. Description:

The residuals from the fit have constant location and variance.
The residuals from the fit are independent.
The residuals from the fit follow a common distribution, usually assumed to be the normal distribution.

When these assumptions are at least approximately satisfied, OLS provides the optimal estimates and uncertainty intervals for the fit coefficients. However, if the assumptions are not at least approximately satisfied, then the OLS estimates may no longer be optimal (and may in fact be quite wrong). Applying transformations and weighting are common approaches to fitting when the assumptions are not satisfied.

Bootstrap fitting provides an additional alternative. The bootstrap is a non-parametric method for calculating a sampling distribution for a statistic. The bootstrap calculates the statistic with N different subsamples. The subsampling is performed with replacement. In the context of fitting, we are estimating the coefficients of the fit and providing bootstrap estimates of the uncertainty.

There are two approaches to bootstrapping for fitting.

In the first approach, the OLS fit is computed from the original data. The residuals are then resampled. The residuals are then added to the predicted values of the original fit to obtain a new Y vector. This new Y vector is then fit against the original X variables. We call this approach residual resampling (or the Efron approach).
In the second approach, rows of the original data (both the Y vector and the corresponding rows of the X variables) are resampled. The resampled data are then fit. We call this approach data resampling (or the Wu approach).

Hamilton (see Reference below) gives some guidance on the contrasts between these approaches.

Residual resampling assumes fixed X values and independent and identically distributed residuals (although the residuals are not assumed to be normally distributed).
Data resampling does not assume independent and identically distributed residuals.

Given the above, if the assumption of fixed X is realistic (that is, we could readily collect new Y's with the same X values), then residual resampling is justified. For example, this would be the case in a designed experiment. However, if this assumption is not realistic (i.e., the X values vary randomly as well as the Y's), then data resampling is preferred.

If the bootstrap methods produce substantially different results, this is an indication that the assumptions of fixed X and independent and identically distributed residuals may not be valid.

The BOOTSTRAP FIT command produces the following output:

A few lines listing the sample size, the number of bootstrap samples, and the bootstrap method used (residual resampling or data resampling).
A summary table where each row of the table corresponds to one of fit coefficients. The first column identifies the parameter. Columns 2 and 3 list the coeficient estimate and standard deviation from the original fit. Columns 4 through 7 list the mean, the standard deviation, and the 2.5 and 97.5 percentiles of the bootstrap sample.
Typically, the analyst will want to further viewing and analysis of the bootstrap samples. For example, a histogram of the bootstrap samples are often displayed.
Dataplot writes the bootstrap samples to file. You can read these back into Dataplot to generate additional graphing and analysis of the bootstrap samples.
Specifically,
- The bootstrap samples for the coefficient estimates are written to the file "dpst1f.dat".
- The bootstrap samples for the coefficient standard deviation estimates are written to the file "dpst2f.dat".
- The bootstrap samples for the residual standard deviation estimates are written to the file "dpst3f.dat".
The program example below demonstrates the use of these files after the BOOTSTRAP FIT command.

Syntax 1:

Examples:

Note:

SET BOOTSTRAP FIT METHOD <RESIDUAL/DATA>

The default is RESIDUAL. You can use EFRON as a synonym for RESIDUAL and WU as a synonym for DATA.

Note:

The BOOTSTRAP FIT METHOD command also applies to theses commands.

Note:

BOOTSTRAP SAMPLES <value>

The default is 100 bootstrap samples. Some sources recommend as many as 2,000 bootstrap samples for accurate confidence intervals for the parameter estimates.

Note:

Some refinements to generate more accurate confidence intervals have been proposed. The issue of bootstrap confidence intervals for multilinear fitting is discussed on pages 319-325 of Hamilton.

Default:

Residual resampling is used with 100 bootstrap samples. Synonyms:

None Related Commands:

BOOTSTRAP SAMPLES	= Specify the number of bootstrap samples to generate.
BOOTSTRAP METHOD	= Specify whether residual or data resampling is used.
BOOTSTRAP PLOT	= Generate a bootstrap plot.
JACKNIFE PLOT	= Generate a jacknife plot.
HISTOGRAM	= Generates a histogram.
KERNEL DENSITY PLOT	= Generates a kernel density plot.
PLOT	= Generates a data/function plot.

Reference:

The American Statistician

Hamilton (1992), "Regression with Graphics: A Second Course in Applied Statistics," Duxbury Press,

Applications:

Fitting Implementation Date:

2002/7 Program:

 
skip 25
read berger1.dat y x
.
set write decimals 5
set bootstrap fit method data
bootstrap samples 100
bootstrap fit y x
.
delete a0 a1
skip 0
set read format 2e15.7
read dpst1f.dat a0 a1
read dpst2f.dat a0sd a1sd
.
multiplot corner coordinates 0 0 100 100
multiplot scale factor 2
multiplot 2 2
.
title a0 estimate
let bmean = mean a0
let b025 = 2.5 percentile a0
let b975 = 97.5 percentile a0
let bmean = int(bmean*1000)/1000
let b025 = int(b025*1000)/1000
let b975 = int(b975*1000)/1000
x2label mean = ^bmean, b025 = ^b025, b975 = ^b975
histogram a0
title a1 estimate
let bmean = mean a1
let b025 = 2.5 percentile a1
let b975 = 97.5 percentile a1
let bmean = int(bmean*1000)/1000
let b025 = int(b025*1000)/1000
let b975 = int(b975*1000)/1000
x2label mean = ^bmean, b025 = ^b025, b975 = ^b975
histogram a1
title a0 standard deviation
let bmean = mean a0sd
let bmean = int(bmean*1000)/1000
x2label mean = ^bmean
histogram a0sd
title a1 standard deviation
let bmean = mean a1sd
let bmean = int(bmean*1000)/1000
x2label mean = ^bmean
histogram a1sd
.
end of multiplot

            Bootstrap Linear/Multilinear Fit
 
Number of Observations:                             107
Number of Bootstrap Sample                          100
Bootstrap Method: Data (Wu)
 
 
            Summary Table
 
--------------------------------------------------------------------------------------------------
   Para- Estimates From Original Fit                  Estimates From Bootstrap Fit
   meter           Coef             SD           Mean             SD            2.5           97.5
--------------------------------------------------------------------------------------------------
      A0        4.99367        1.12565        4.87176        1.13198        2.77080        6.68491
      A1        0.73111        0.02455        0.73568        0.02478        0.68731        0.80165