SED navigation bar go to SED home page go to Dataplot home page go to NIST home page SED Home Page SED Staff SED Projects SED Products and Publications Search SED Pages
Dataplot Vol 1 Vol 2

FIT

Name:
    ... FIT
Type:
    Analysis Command
Purpose:
    Estimate the parameters for a linear, polynomial, or non-linear least squares fit. This is one of Dataplot's most powerful and heavily used commands.
Description:
    The FIT command can be used for both linear or non-linear fits. Both weighted and unweighted fits are supported.

    Non-linear fits are performed using an iterative modified Levenberg-Marquardt algorithm (Dataplot implements the algorithm given in the Osborne paper listed in the References section below). This algorithm can fit linear and multi-linear models as well as non-linear models.

    In addition, the FIT command can perform linear and polynomoal fits using a non-iterative algorithm. Since the non-iterative algorithm supports a much broader range of output, this will be documented separately, Enter

    for the documentation for exact linear fits.

    Non-linear fits are specified by entering a function. For example,

      FIT Y = A0 + A1*X1
      FIT Y = A0 + A1*EXP(A2*(YEAR-1950)
      FIT Y = (A0 + A1*X)/(1 + B1*X)

    The function can either be given on the FIT command or be defined with a LET FUNCTION command.

    For non-linear fits, the FIT command generates the following output.

    1. The parameter estimates and associated standard deviations are printed for each iteration.

    2. After convergence, a table containing the parameter estimates, the parameter standard deviations, and the parameter t-values is printed. The t-value is used to determine if a given paramater is statistically significant.

      These values are also written to the file dpst1f.dat. To read these values into Dataplot variables, enter the command

        SKIP 1
        READ DPST1F.DAT COEF COEFSD TVAL

    3. The correlation matrix for the parameter estimates is written to the file dpst2f.dat. To read this correlation matrix, enter the command

        SKIP 1
        READ MATRIX DPST2F.DAT CORR

    4. The variance-covariance matrix for the parameter estimates is written to the file dpst3f.dat. To read this covariance matrix, enter the command

        SKIP 0
        READ MATRIX DPST3F.DAT COV

    5. The residual standard deviation and its corresponding degrees of freedom are are stored in the parameters RESSD and RESDF, respectively. RESDF is the number of observations minus the number of independent variables in the fit (including the constant term). The formula for RESSD is:

        \( \mbox{RESSD} = \sqrt{\frac{\sum_{i}^{n}{(Y - \hat{Y})^2}} {\mbox{RESDF}} } \)

    6. If there is replication in the independent variables, the replication standard deviation and corresponding degrees of freedom are printed. In addition, a lack of fit F test is performed. These are stored in the parameters REPDF, REPSD, and LOFCDF respectively. The formulas are:

        \( \mbox{REPDF} = \sum_{i}^{nrep}{(n_i - 1)} \)

      with \( nrep \) and \( n_i \) denoting the number of replications and the number of observations in the i-th replication, respectively.

        \( \mbox{REPSD} = \sqrt{\frac{\sum_{i}^{nrep}{(Y - \bar{Y}_{i})^2}} {\mbox{REPDF}}} \)

      with \( \bar{Y}_{i} \) denoting the mean of the i-th replication.

    7. Dataplot saves the predicted values from a fit in the variable PRED and the residual values in the variable RES. These variables can be used in subsequent LET and PLOT commands to generate diagnostic plots of residuals and predicted values.

    It is recommended that a FIT be followed by a residual analysis to assess the model adequacy. Specifically, the typical assumptions for the residuals are that they are independent with a common distribution having fixed location and variation. It is usually assumed that the common distribution is a normal distribution. The 4-PLOT command generates 4 plots that are useful in testing these assumptions. The NIST/SEMATECH e-Handbook contains a more detailed discussion of this issue at

    In addition, if there is a single independent variable in the model, it can be useful to plot the data with the fitted values overlaid.

    For non-linear fits, up to 15 indepedent variables can be included in the model.

Syntax:
    FIT <y1> = <f> <SUBSET/EXCEPT/FOR qualification>
    where <y1> is the response (= dependent) variable;
    <f> is:
    1. a general Fortran-like expression; or
    2. any function name that the user has already created via the LET FUNCTION command;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    This syntax is appropriate for all models--linear, polynomial, multi-linear (up to 15 independent variables), and non-linear (up to 15 independent variables). It uses an iterative modified Levenberg-Marquardt algorithm. Linear fits are handled as a special case (the fits are still done iteratively).

Examples:
    FIT Y = A+B*EXP(-C*X)
    FIT Y = A*(EXP(-B*TIME/10) - EXP(-TIME/10))
    FIT Y = B0 + B1*X**B2
    FIT Y = K/(1+K*A*X**B)
    FIT Y = A - B*X - ATAN(C/(X-D))/3.1459
    FIT Y = A0*BESS0(A1*X)*BESS1(A1*X)
    FIT Y = (A0 + A1*X)/(1 + B1*X + B2*X**2)
    FIT Y = (A+B*X+C*X**D)/(SIN(EXP(-ALPHA*X2+BETA*X3)))

    FIT Y = F1

Note: Note:
    The non-linear algorithm is iterative with two commands for controlling the iterations.

    By default, a maximum of 50 iterations are allowed before Dataplot assumes the fit is not converging. You can change this maximum with the FIT ITERATIONS command.

    Dataplot checks for convergence by computing the ratio of sccuessive values of the residual standard deviation. You can specify the critierion for convergence with the FIT STANDARD DEVIATION command.

Note:
    Starting values are not required. The Levenberg-Marquardt algorithm can provide good fits for a wide variety of applications without decent starting values.

    However, decent starting values can often speed up non-linear fits. In addition, some fits may require good starting values in order to converge to accurate values.

    To specify starting values, simply assign values to the coefficients before doing the fit. For example:

      LET ALPHA = 0.15
      LET A = 0.004
      LET B = 0.01
      FIT Y = EXP(-ALPHA*X)/(A+B*X)

    In some cases, good starting values might be known from previous work or from theoretical considerations. However, if better starting values are needed and reasonable guesses are not available, the PRE-FIT command can be helpful.

    Sinusoidal models are one case where good starting values are needed. See the following example from the NIST/SEMATECH e-Handbook for an example of fitting this kind of model

    If you have a parameter in the model that you want to set to a fixed value, then enter the literal value or use the substitution character "^". For example

      FIT Y = A0 + A1*X**1.5

      LET C = 1.5
      FIT Y = A0 + A1*X**C - Dataplot will try to fit C
      FIT Y = A0 + A1*X**^C - Dataplot will leave C fixed at 1.5

Note:
    Weighted fits are typically used in the following two situations.

    1. Weighting is one approach for dealing with non-constant variation in the residuals. It is not uncommon for the variance of the residuals to increase for the largest (or smallest) values of the independent variable. In this case, weights can be used to give less weight to the less precise measurements. The NIST/SEMATECH e-Handbook contains a disucssion of weighted fits and an example of using weights to address non-constant variation in the following pages

    2. Weights can also used to implement certain types of robust fitting. In this case, weights are used to down weight observations based on the size of the associated residual. Outlier observations can sometimes distort a fit (i.e., in trying to fit the outlier point(s), the bulk of the data is poorly fit). Weighting based on the residuals can often provide a good fit to the bulk of the data without eliminating the outlier observations from the analysis.

      Enter HELP WEIGHTS and HELP BIWEIGHT for examples of this use of weighted fits in Dataplot.

    To specify weights for a least squares fit, enter the command

      WEIGHTS <var>

    where <var> is a variable containing the weights.

    Note that the RES variable contains the absolute value of the residuals after the fit. For residual plots and analysis, it may be preferrable to work with the weighted residuals. You can create this with the command

      LET RESW = W*RES

    where W contains the weight variable.

Note:
    Data transformations are often used to improve the quality of the fit. For example, some types of non-linear fits can be restated as linear fits with an appropriate transformation. Also, transformations are often applied to address non-homogeneous variation in the fit. The NIST/SEMATECH e-Handbook contains a disucssion of this issue at

    Data transformations can be generated easily if needed via the LET command. The BOX-COX LINEARITY PLOT can be a useful command for determining an approriate transformation.

    Some analysts prefer to standardize the indpendent variables and the dependent variable by subtracting the mean and dividing by the standard deviation. This is done to provide numerical stability (note that Dataplot scales the data internally before performing the regression calculations) and also so that the data and regression coefficients are on a common scale. The original regression and standardized model are related as follows

      \( x_{i}^{'} = \frac{x_{i} - \bar{x}}{s_{x}} \)

      \( y_{i}^{'} = \frac{y_{i} - \bar{y}}{s_{y}} \)

    with \( \bar{x} \) and \( s_x \) denoting the mean and standard deviation of the independent variable and \( \bar{y} \) and \( s_y \) denoting the mean and standard deviation of the dependent variable.

    The parameters are related by

      \( \beta_{k} = \frac{s_{y}}{s_{k}} \beta_{k}^{'} \)

      \( \beta_{0}^{'} = \bar{y} - \beta_{1} \bar{x}_1 - \ldots - \beta_{p} \bar{x}_p \)

    A variation on this is the correlation transformation (also called the standardized regression model). Specifically

      \( y_{i}^{'} = \frac{1}{\sqrt{n-1}} \frac{y_{i} - \bar{y}}{s_{y}} \)

      \( x_{ik}^{'} = \frac{1}{\sqrt{n-1}} \frac{x_{ik} - \bar{x}_{k}} {s_{x_k}} \)

    With this transformation, the \( X'X \) matrix reduces to a correlation matrix of the independent variables. If there are \( p \) independent variables, these transformations can be generated with the commands

       
      LET N = SIZE Y
      LET FACT = 1/SQRT(N-1)
      LOOP FOR K = 1 1 P
          LET Z^K = STANDARDIZE X^K
          LET Z^K = AFACT*Z^K
      END OF LOOP
      LET YT = STANDARDIZE Y LET YT = AFACT*YT
Note:
    Although the FIT command is the Dataplot workhorse command for fitting, Dataplot supports the additional fit capabilities:

    1. ORTHOGONAL DISTANCE FIT - This command is used to fit errors-in-variables models for for both linear and non-linear models. It can also fit implicit models.

    2. BOOTSTRAP FIT - This command is used to fit linear or multilinear models using the bootstrap.

    3. EXACT RATIONAL FIT - This command is used to determine good starting values for fitting rational function models (the full model is still fit using the FIT command).

      Rational function models are the ratio of two polynomial functions. The NIST/SEMATECH e-Handbook contains a detailed discussion of these models at

    4. CALIBRATION - This command is used to fit linear or quadratic calibration models.

    5. YATES ANALYSIS - This command is used to fit full and fractional 2-level designs.

    6. SPLINE FIT - This command is used for spline fits.

    7. LOWESS SMOOTH - This command is used to fit locally-weighted least squares models.

    8. ARMA - This command is used for fitting autoregressive/moving average time series models.

    9. PRINCIPAL COMPONENTS - This LET subcommand can be used to reduce the number of indpendent variables in a multi-linear fit.

    10. SMOOTH - This command is used for various types of smoothing.

    11. INTERPOLATION - This LET subcommand performs cubic spline interpolation.

    12. HERMITE INTERPOLATION - This LET subcommand performs Hermite interpolation.

    These commands are documented separately.

Note:
    If you want to suppress the output to files dpst1f.dat, dpst2f.dat, and dpst3f.dat, enter the command

      SET FIT AUXILLARY FILES OFF
Note:
    By default, the values written to dpst1f.dat, dpst2f.dat and dpst3f.dat are written using a Fortran E15.7 format (that is, exponential format with 7 significant digits). You can specify the number of signficant digits with the command

      SET AUXILLARY FILES DECIMAL POINTS <value>

    where the default is 7.

Default:
    None
Synonyms:
    None
Related Commands:
    FIT ITERATIONS = Sets the maximum number of iterations for the FIT command.
    FIT STANDARD DEVIATION = Sets the minimum standard deviation for the convergence criterion in the FIT command.
    PRED = A variable where predicted values are stored.
    RES = A variable where residuals are stored.
    RESSD = A parameter where the residual standard deviation is stored.
    RESDF = A parameter where the residual degrees of freedom is stored.
    REPSD = A parameter where the replication standard deviation is stored.
    REPDF = A parameter where the replication degrees of freedom is stored.
    LOFCDF = A parameter where the lack of fit cdf is stored.
    WEIGHTS = Sets the weights for the fit command.
    BIWEIGHT = Perform a biweight transformation.
    EXACT RATIONAL FIT = Perform an exact rational fit.
    CALIBRATION = Perform a linear or quadratic calibration fit.
    LOWESS = Perform a locally weighted least squares smoothing.
    BOOTSTRAP FIT = = Perform a linear or multi-linear fit based on the bootstrap.
    ORTHOGONAL DISTANCE FIT = = Perform an orthogonal distance fit (useful for errors-in-variables models).
    PRE-FIT = Perform a least squares pre-fit.
    SPLINE FIT = Perform a spline fit.
    SMOOTH = Perform a smoothing.
    ANOVA = Perform a fixed effects analysis of variance.
    MEDIAN POLISH = Perform a median polish.
    PLOT = Generate a data/function plot.
    4-PLOT = Generate a 4-plot.
References:
    Osborne (1972), "Some Aspects of Nonlinear Least Squares Calculation", in Numerical Methods for Nonlinear Optimization, Ed. Lootsma, Academic Press.

    Osborne (1976), "Nonlinear Least Squares -- the Levenberg Algorithm Revisited", ANZIAM Journal, Vol. 19, No. 3, pp. 343-357.

Applications:
    Least Squares Fitting
Implementation Date:
    Pre-1987
    1987/09: Support for weighted fits
    1988/03: Save LOFCDF parameter
    1991/09: Expand number of allowed independent variables from 15 to 5
    1992/03: Write coefficient, coefficient sd, and t-value to dpst1f.dat
    1992/03: Write coefficient, coefficient sd, and t-value to dpst1f.dat
    1997/07: Print summary information if maximum iterations reached
    2001/04: Print parameter covariance matrix to dpst3f.dat
    2014/06: Option to suppress output to auxillary files
    2019/04: Option to suppress output to auxillary files
Program 1:
     
    . Step 1:   Read the data
    .
    SKIP 25
    READ CHWIRUT1.DAT Y X
    SKIP 0
    .
    . Step 2:   Perform the fit
    .
    SET WRITE DECIMALS 5
    LET ALPHA = 0.15
    LET A = 0.004
    LET B = 0.01
    FIT Y = EXP(-ALPHA*X)/(A+B*X)
    .
    . Step 3:   Generate diagonistic graphs
    .
    TITLE OFFSET 2
    TITLE CASE ASIS
    LABEL CASE ASIS
    TITLE Predicted Values Overlaid on Raw Data (CHWIRUT1.DAT)
    X1LABEL Metal Distance
    Y1LABEL Ultrasonic Response
    .
    LINE BLANK SOLID
    CHARACTER X BLANK
    .
    PLOT Y PRED VS X
    .
    LABEL
    TITLE
    SET 4-PLOT MULTIPLOT ON
    MULTIPLOT CORNER COORDINATES 0 0 100 100
    TIC MARK LABEL SIZE 4
    CHARACTER SIZE 4
    .
    4-PLOT RES
    .
    JUSTIFICATION CENTER
    MOVE 50 97
    TEXT 4-Plot of Residuals (CHWIRUT1.DAT)
        
    The following output is generated.
                 Least Squares Non-Linear Fit
      
     Sample Size:                                        214
     Model: Y =EXP(-ALPHA*X)/(A+B*X)
     Replication Case:
     Replication Standard Deviation:                 3.28176
     Replication Degrees of Freedom:                     192
     Number of Distinct Subsets:                          22
      
      
     ----------------------------------------------------------------------------------------
                                     Residual *
      Iteration    Convergence       Standard *      Parameter
         Number        Measure      Deviation *      Estimates
     ----------------------------------------------------------------------------------------
              1  0.1000000E-01  0.1077871E+02 *   0.1500000E+00  0.4000000E-02  0.1000000E-01
              2  0.5000000E-02  0.3721930E+01 *   0.1807460E+00  0.5554412E-02  0.1071653E-01
              3  0.2500000E-02  0.3362018E+01 *   0.1905488E+00  0.6119125E-02  0.1051960E-01
              4  0.1250000E-02  0.3361673E+01 *   0.1904515E+00  0.6133742E-02  0.1052492E-01
      
      
     --------------------------------------------------------------------
                                                    Approximate
             Final Parameter Estimates       Standard Deviation   t-Value
     --------------------------------------------------------------------
       1  ALPHA                     0.19041             0.02207    8.6266
       2  A                         0.00613             0.00035   17.5593
       3  B                         0.01053             0.00080   13.1131
      
      
     Residual Standard Deviation:                    3.36167
     Residual Degrees of Freedom:                        211
     Replication Standard Deviation:                 3.28176
     Replication Degrees of Freedom:                     192
     Lack of Fit F Ratio:                            1.54740
     Lack of Fit F CDF (%):                         92.64608
     Lack of Fit Degrees of Freedom 1:                    19
     Lack of Fit Degrees of Freedom 2:                   192
        
    plot generated by sample program

    plot generated by sample program

Program 2:
     
    . Step 1:   Read the data
    .
    READ ROSZMAN1.DAT X T
    LET Q = X - SQRT(-109737.3/T)
    .
    . Step 2:   Perform the fit
    .
    SET WRITE DECIMALS 5
    LET A = 0.2
    LET B = -0.00005
    LET C = 200
    LET D = -123
    .
    CAPTURE SCREEN ON
    CAPTURE FIT2.OUT
    FIT Q = A - B*T - ATAN(C/(T-D))/3.14159
    END OF CAPTURE
    .
    . Step 3:   Generate diagonistic graphs
    .
    TITLE OFFSET 2
    TITLE CASE ASIS
    LABEL CASE ASIS
    TITLE Predicted Values Overlaid on Raw Data (ROSZMAN1.DAT)
    X1LABEL Excited State Energy
    Y1LABEL Quantum Effects for Sulfur I Atom
    .
    LINE BLANK SOLID
    CHARACTER X BLANK
    .
    PLOT Q PRED VS T
    .
    LABEL
    TITLE
    SET 4-PLOT MULTIPLOT ON
    MULTIPLOT CORNER COORDINATES 0 0 100 100
    TIC MARK LABEL SIZE 4
    CHARACTER SIZE 4
    .
    4-PLOT RES
    .
    JUSTIFICATION CENTER
    MOVE 50 97
    TEXT 4-Plot of Residuals (ROSZMAN1.DAT)
        
    The following output is generated.
                 Least Squares Non-Linear Fit
      
     Sample Size:                                          25
     Model: Q =A - B*T - ATAN(C/(T-D))/3.14159
     No Replication Case:
      
      
     -------------------------------------------------------------------------------------------------------
                                     Residual *
      Iteration    Convergence       Standard *      Parameter
         Number        Measure      Deviation *      Estimates
     -------------------------------------------------------------------------------------------------------
              1  0.1000000E-01  0.2922875E+00 *   0.2000000E+00 -0.5000000E-04  0.2000000E+03 -0.1230000E+03
              2  0.5000000E-02  0.6473232E-01 *   0.2938492E+00 -0.1997106E-04  0.7968499E+03  0.4821014E+03
              3  0.2500000E-02  0.4032352E-01 *   0.1419884E+00 -0.4890395E-06  0.1872259E+04  0.1699836E+03
              4  0.1265625E-01  0.3540118E-01 *   0.2300597E+00 -0.7388945E-05  0.7415686E+03 -0.2177707E+03
              5  0.6328125E-02  0.7476144E-02 *   0.2339476E+00 -0.1048361E-04  0.1026235E+04 -0.1199718E+03
              6  0.3164063E-02  0.5190951E-02 *   0.2057654E+00 -0.6787648E-05  0.1194028E+04 -0.1579526E+03
              7  0.1582031E-02  0.4854277E-02 *   0.2019346E+00 -0.6193031E-05  0.1204886E+04 -0.1813351E+03
              8  0.7910156E-03  0.4854238E-02 *   0.2019425E+00 -0.6191448E-05  0.1204564E+04 -0.1813955E+03
      
      
     --------------------------------------------------------------------
                                                    Approximate
             Final Parameter Estimates       Standard Deviation   t-Value
     --------------------------------------------------------------------
       1  A                         0.20194             0.01927   10.4816
       2  B                        -0.00001             0.00000   -1.9250
       3  C                      1204.55836            74.63673   16.1389
       4  D                      -181.39214            49.88409   -3.6363
      
      
     Residual Standard Deviation:                    0.00485
     Residual Degrees of Freedom:                         21
        
    plot generated by sample program

    plot generated by sample program

Date created: 09/02/2021
Last updated: 12/04/2023

Please email comments on this WWW page to alan.heckert@nist.gov.