SED navigation bar go to SED home page go to Dataplot home page go to NIST home page SED Home Page SED Staff SED Projects SED Products and Publications Search SED Pages
Dataplot Vol 1 Vol 2

BOOTSTRAP FIT

Name:
    BOOTSTRAP FIT
Type:
    Graphics Command
Purpose:
    Performs a bootstrap linear or multilinear fit.
Description:
    The standard solution for linear/nultilinear fitting is to use ordinary least squares (OLS). OLS is based on several assumptions:

    1. The residuals from the fit have constant location and variance.

    2. The residuals from the fit are independent.

    3. The residuals from the fit follow a common distribution, usually assumed to be the normal distribution.

    When these assumptions are at least approximately satisfied, OLS provides the optimal estimates and uncertainty intervals for the fit coefficients. However, if the assumptions are not at least approximately satisfied, then the OLS estimates may no longer be optimal (and may in fact be quite wrong). Applying transformations and weighting are common approaches to fitting when the assumptions are not satisfied.

    Bootstrap fitting provides an additional alternative. The bootstrap is a non-parametric method for calculating a sampling distribution for a statistic. The bootstrap calculates the statistic with N different subsamples. The subsampling is performed with replacement. In the context of fitting, we are estimating the coefficients of the fit and providing bootstrap estimates of the uncertainty.

    There are two approaches to bootstrapping for fitting.

    1. In the first approach, the OLS fit is computed from the original data. The residuals are then resampled. The residuals are then added to the predicted values of the original fit to obtain a new Y vector. This new Y vector is then fit against the original X variables. We call this approach residual resampling (or the Efron approach).

    2. In the second approach, rows of the original data (both the Y vector and the corresponding rows of the X variables) are resampled. The resampled data are then fit. We call this approach data resampling (or the Wu approach).

    Hamilton (see Reference below) gives some guidance on the contrasts between these approaches.

    1. Residual resampling assumes fixed X values and independent and identically distributed residuals (although the residuals are not assumed to be normally distributed).

    2. Data resampling does not assume independent and identically distributed residuals.

    Given the above, if the assumption of fixed X is realistic (that is, we could readily collect new Y's with the same X values), then residual resampling is justified. For example, this would be the case in a designed experiment. However, if this assumption is not realistic (i.e., the X values vary randomly as well as the Y's), then data resampling is preferred.

    If the bootstrap methods produce substantially different results, this is an indication that the assumptions of fixed X and independent and identically distributed residuals may not be valid.

    The BOOTSTRAP FIT command produces the following output:

    1. A few lines listing the sample size, the number of bootstrap samples, and the bootstrap method used (residual resampling or data resampling).

    2. A summary table where each row of the table corresponds to one of fit coefficients. The first column identifies the parameter. Columns 2 and 3 list the coeficient estimate and standard deviation from the original fit. Columns 4 through 7 list the mean, the standard deviation, and the 2.5 and 97.5 percentiles of the bootstrap sample.

    3. Typically, the analyst will want to further viewing and analysis of the bootstrap samples. For example, a histogram of the bootstrap samples are often displayed.

      Dataplot writes the bootstrap samples to file. You can read these back into Dataplot to generate additional graphing and analysis of the bootstrap samples.

      Specifically,

      • The bootstrap samples for the coefficient estimates are written to the file "dpst1f.dat".

      • The bootstrap samples for the coefficient standard deviation estimates are written to the file "dpst2f.dat".

      • The bootstrap samples for the residual standard deviation estimates are written to the file "dpst3f.dat".

      The program example below demonstrates the use of these files after the BOOTSTRAP FIT command.

Syntax 1:
    BOOTSTRAP FIT <y> <x1> ... <xk>               <SUBSET/EXCEPT/FOR qualification>
    where <y> is the response (dependent) variable;
                  <x1> .... <xk> is a list of one or more independent variables;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.
Examples:
    BOOTSTRAP FIT Y X
    BOOTSTRAP Y X1 X2 X3 X4
    BOOTSTRAP Y X1 X2 X3 X4 SUBSET TAG > 1
Note:
    Use the following command to specify which bootstrap method is used:

      SET BOOTSTRAP FIT METHOD <RESIDUAL/DATA>

    The default is RESIDUAL. You can use EFRON as a synonym for RESIDUAL and WU as a synonym for DATA.

Note:
    Linear and quadratic calibration are special applications of fitting. The following commands can be used to perform a bootstrap analysis of linear and quadratic calibration, respectively:

      LET Y0 = <value>
      BOOTSTRAP LINEAR CALIBRATION PLOT
      BOOTSTRAP QUADRATIC CALIBRATION PLOT

    The BOOTSTRAP FIT METHOD command also applies to theses commands.

Note:
    The number of bootstrap samples can be specified with the command:

      BOOTSTRAP SAMPLES <value>

    The default is 100 bootstrap samples. Some sources recommend as many as 2,000 bootstrap samples for accurate confidence intervals for the parameter estimates.

Note:
    Confidence intervals for the coefficients of the fit model can be obtained from the appropriate percentiles of the bootstrap samples (e.g., the 2.5% and 97.5% percentiles provided in the summary table). However, in some cases these percentiles may have less than the nominal coverage probability.

    Some refinements to generate more accurate confidence intervals have been proposed. The issue of bootstrap confidence intervals for multilinear fitting is discussed on pages 319-325 of Hamilton.

Default:
    Residual resampling is used with 100 bootstrap samples.
Synonyms:
    None
Related Commands: Reference:
    Efron and Gong, 1983. "A Leisurely Look at the Bootstrap, the Jacknife, and Cross-Validation," The American Statistician.

    Hamilton (1992), "Regression with Graphics: A Second Course in Applied Statistics," Duxbury Press,

Applications:
    Fitting
Implementation Date:
    2002/7
Program:
     
    skip 25
    read berger1.dat y x
    .
    set write decimals 5
    set bootstrap fit method data
    bootstrap samples 100
    bootstrap fit y x
    .
    delete a0 a1
    skip 0
    set read format 2e15.7
    read dpst1f.dat a0 a1
    read dpst2f.dat a0sd a1sd
    .
    multiplot corner coordinates 0 0 100 100
    multiplot scale factor 2
    multiplot 2 2
    .
    title a0 estimate
    let bmean = mean a0
    let b025 = 2.5 percentile a0
    let b975 = 97.5 percentile a0
    let bmean = int(bmean*1000)/1000
    let b025 = int(b025*1000)/1000
    let b975 = int(b975*1000)/1000
    x2label mean = ^bmean, b025 = ^b025, b975 = ^b975
    histogram a0
    title a1 estimate
    let bmean = mean a1
    let b025 = 2.5 percentile a1
    let b975 = 97.5 percentile a1
    let bmean = int(bmean*1000)/1000
    let b025 = int(b025*1000)/1000
    let b975 = int(b975*1000)/1000
    x2label mean = ^bmean, b025 = ^b025, b975 = ^b975
    histogram a1
    title a0 standard deviation
    let bmean = mean a0sd
    let bmean = int(bmean*1000)/1000
    x2label mean = ^bmean
    histogram a0sd
    title a1 standard deviation
    let bmean = mean a1sd
    let bmean = int(bmean*1000)/1000
    x2label mean = ^bmean
    histogram a1sd
    .
    end of multiplot
        
    The BOOTSTRAP FIT command generates the following output:
                Bootstrap Linear/Multilinear Fit
     
    Number of Observations:                             107
    Number of Bootstrap Sample                          100
    Bootstrap Method: Data (Wu)
     
     
                Summary Table
     
    --------------------------------------------------------------------------------------------------
       Para- Estimates From Original Fit                  Estimates From Bootstrap Fit
       meter           Coef             SD           Mean             SD            2.5           97.5
    --------------------------------------------------------------------------------------------------
          A0        4.99367        1.12565        4.87176        1.13198        2.77080        6.68491
          A1        0.73111        0.02455        0.73568        0.02478        0.68731        0.80165
        
    Histograms can be generated for the bootstrap samples of each of the parameter estimates and the standard deviations of the parameter estimates.

    plot generated by sample program

Date created: 08/12/2002
Last updated: 12/11/2023

Please email comments on this WWW page to alan.heckert@nist.gov.