Dataplot Vol 1 Vol 2

# FIT

Name:
... FIT
Type:
Analysis Command
Purpose:
Estimate the parameters for a linear, polynomial, or non-linear least squares fit. This is one of Dataplot's most powerful and heavily used commands.
Description:
The FIT command can be used for both linear or non-linear fits. Both weighted and unweighted fits are supported.

Non-linear fits are performed using an iterative modified Levenberg-Marquardt algorithm (Dataplot implements the algorithm given in the Osborne paper listed in the References section below). This algorithm can fit linear and multi-linear models as well as non-linear models.

In addition, the FIT command can perform linear and polynomoal fits using a non-iterative algorithm. Since the non-iterative algorithm supports a much broader range of output, this will be documented separately, Enter

for the documentation for exact linear fits.

Non-linear fits are specified by entering a function. For example,

FIT Y = A0 + A1*X1
FIT Y = A0 + A1*EXP(A2*(YEAR-1950)
FIT Y = (A0 + A1*X)/(1 + B1*X)

The function can either be given on the FIT command or be defined with a LET FUNCTION command.

For non-linear fits, the FIT command generates the following output.

1. The parameter estimates and associated standard deviations are printed for each iteration.

2. After convergence, a table containing the parameter estimates, the parameter standard deviations, and the parameter t-values is printed. The t-value is used to determine if a given paramater is statistically significant.

These values are also written to the file dpst1f.dat. To read these values into Dataplot variables, enter the command

SKIP 1

3. The correlation matrix for the parameter estimates is written to the file dpst2f.dat. To read this correlation matrix, enter the command

SKIP 1

4. The variance-covariance matrix for the parameter estimates is written to the file dpst3f.dat. To read this covariance matrix, enter the command

SKIP 0

5. The residual standard deviation and its corresponding degrees of freedom are are stored in the parameters RESSD and RESDF, respectively. RESDF is the number of observations minus the number of independent variables in the fit (including the constant term). The formula for RESSD is:

$$\mbox{RESSD} = \sqrt{\frac{\sum_{i}^{n}{(Y - \hat{Y})^2}} {\mbox{RESDF}} }$$

6. If there is replication in the independent variables, the replication standard deviation and corresponding degrees of freedom are printed. In addition, a lack of fit F test is performed. These are stored in the parameters REPDF, REPSD, and LOFCDF respectively. The formulas are:

$$\mbox{REPDF} = \sum_{i}^{nrep}{(n_i - 1)}$$

with $$nrep$$ and $$n_i$$ denoting the number of replications and the number of observations in the i-th replication, respectively.

$$\mbox{REPSD} = \sqrt{\frac{\sum_{i}^{nrep}{(Y - \bar{Y}_{i})^2}} {\mbox{REPDF}}}$$

with $$\bar{Y}_{i}$$ denoting the mean of the i-th replication.

7. Dataplot saves the predicted values from a fit in the variable PRED and the residual values in the variable RES. These variables can be used in subsequent LET and PLOT commands to generate diagnostic plots of residuals and predicted values.

It is recommended that a FIT be followed by a residual analysis to assess the model adequacy. Specifically, the typical assumptions for the residuals are that they are independent with a common distribution having fixed location and variation. It is usually assumed that the common distribution is a normal distribution. The 4-PLOT command generates 4 plots that are useful in testing these assumptions. The NIST/SEMATECH e-Handbook contains a more detailed discussion of this issue at

In addition, if there is a single independent variable in the model, it can be useful to plot the data with the fitted values overlaid.

For non-linear fits, up to 15 indepedent variables can be included in the model.

Syntax:
FIT <y1> = <f> <SUBSET/EXCEPT/FOR qualification>
where <y1> is the response (= dependent) variable;
<f> is:
1. a general Fortran-like expression; or
2. any function name that the user has already created via the LET FUNCTION command;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

This syntax is appropriate for all models--linear, polynomial, multi-linear (up to 15 independent variables), and non-linear (up to 15 independent variables). It uses an iterative modified Levenberg-Marquardt algorithm. Linear fits are handled as a special case (the fits are still done iteratively).

Examples:
FIT Y = A+B*EXP(-C*X)
FIT Y = A*(EXP(-B*TIME/10) - EXP(-TIME/10))
FIT Y = B0 + B1*X**B2
FIT Y = K/(1+K*A*X**B)
FIT Y = A - B*X - ATAN(C/(X-D))/3.1459
FIT Y = A0*BESS0(A1*X)*BESS1(A1*X)
FIT Y = (A0 + A1*X)/(1 + B1*X + B2*X**2)
FIT Y = (A+B*X+C*X**D)/(SIN(EXP(-ALPHA*X2+BETA*X3)))

FIT Y = F1

Note:
The following document contains a number of examples of the Dataplot FIT command

Note:
The non-linear algorithm is iterative with two commands for controlling the iterations.

By default, a maximum of 50 iterations are allowed before Dataplot assumes the fit is not converging. You can change this maximum with the FIT ITERATIONS command.

Dataplot checks for convergence by computing the ratio of sccuessive values of the residual standard deviation. You can specify the critierion for convergence with the FIT STANDARD DEVIATION command.

Note:
Starting values are not required. The Levenberg-Marquardt algorithm can provide good fits for a wide variety of applications without decent starting values.

However, decent starting values can often speed up non-linear fits. In addition, some fits may require good starting values in order to converge to accurate values.

To specify starting values, simply assign values to the coefficients before doing the fit. For example:

LET ALPHA = 0.15
LET A = 0.004
LET B = 0.01
FIT Y = EXP(-ALPHA*X)/(A+B*X)

In some cases, good starting values might be known from previous work or from theoretical considerations. However, if better starting values are needed and reasonable guesses are not available, the PRE-FIT command can be helpful.

Sinusoidal models are one case where good starting values are needed. See the following example from the NIST/SEMATECH e-Handbook for an example of fitting this kind of model

If you have a parameter in the model that you want to set to a fixed value, then enter the literal value or use the substitution character "^". For example

FIT Y = A0 + A1*X**1.5

LET C = 1.5
FIT Y = A0 + A1*X**C - Dataplot will try to fit C
FIT Y = A0 + A1*X**^C - Dataplot will leave C fixed at 1.5

Note:
Weighted fits are typically used in the following two situations.

1. Weighting is one approach for dealing with non-constant variation in the residuals. It is not uncommon for the variance of the residuals to increase for the largest (or smallest) values of the independent variable. In this case, weights can be used to give less weight to the less precise measurements. The NIST/SEMATECH e-Handbook contains a disucssion of weighted fits and an example of using weights to address non-constant variation in the following pages

2. Weights can also used to implement certain types of robust fitting. In this case, weights are used to down weight observations based on the size of the associated residual. Outlier observations can sometimes distort a fit (i.e., in trying to fit the outlier point(s), the bulk of the data is poorly fit). Weighting based on the residuals can often provide a good fit to the bulk of the data without eliminating the outlier observations from the analysis.

Enter HELP WEIGHTS and HELP BIWEIGHT for examples of this use of weighted fits in Dataplot.

To specify weights for a least squares fit, enter the command

WEIGHTS <var>

where <var> is a variable containing the weights.

Note that the RES variable contains the absolute value of the residuals after the fit. For residual plots and analysis, it may be preferrable to work with the weighted residuals. You can create this with the command

LET RESW = W*RES

where W contains the weight variable.

Note:
Data transformations are often used to improve the quality of the fit. For example, some types of non-linear fits can be restated as linear fits with an appropriate transformation. Also, transformations are often applied to address non-homogeneous variation in the fit. The NIST/SEMATECH e-Handbook contains a disucssion of this issue at

Data transformations can be generated easily if needed via the LET command. The BOX-COX LINEARITY PLOT can be a useful command for determining an approriate transformation.

Some analysts prefer to standardize the indpendent variables and the dependent variable by subtracting the mean and dividing by the standard deviation. This is done to provide numerical stability (note that Dataplot scales the data internally before performing the regression calculations) and also so that the data and regression coefficients are on a common scale. The original regression and standardized model are related as follows

$$x_{i}^{'} = \frac{x_{i} - \bar{x}}{s_{x}}$$

$$y_{i}^{'} = \frac{y_{i} - \bar{y}}{s_{y}}$$

with $$\bar{x}$$ and $$s_x$$ denoting the mean and standard deviation of the independent variable and $$\bar{y}$$ and $$s_y$$ denoting the mean and standard deviation of the dependent variable.

The parameters are related by

$$\beta_{k} = \frac{s_{y}}{s_{k}} \beta_{k}^{'}$$

$$\beta_{0}^{'} = \bar{y} - \beta_{1} \bar{x}_1 - \ldots - \beta_{p} \bar{x}_p$$

A variation on this is the correlation transformation (also called the standardized regression model). Specifically

$$y_{i}^{'} = \frac{1}{\sqrt{n-1}} \frac{y_{i} - \bar{y}}{s_{y}}$$

$$x_{ik}^{'} = \frac{1}{\sqrt{n-1}} \frac{x_{ik} - \bar{x}_{k}} {s_{x_k}}$$

With this transformation, the $$X'X$$ matrix reduces to a correlation matrix of the independent variables. If there are $$p$$ independent variables, these transformations can be generated with the commands


LET N = SIZE Y
LET FACT = 1/SQRT(N-1)
LOOP FOR K = 1 1 P
LET Z^K = STANDARDIZE X^K
LET Z^K = AFACT*Z^K
END OF LOOP
LET YT = STANDARDIZE Y
LET YT = AFACT*YT

Note:
Although the FIT command is the Dataplot workhorse command for fitting, Dataplot supports the additional fit capabilities:

1. ORTHOGONAL DISTANCE FIT - This command is used to fit errors-in-variables models for for both linear and non-linear models. It can also fit implicit models.

2. BOOTSTRAP FIT - This command is used to fit linear or multilinear models using the bootstrap.

3. EXACT RATIONAL FIT - This command is used to determine good starting values for fitting rational function models (the full model is still fit using the FIT command).

Rational function models are the ratio of two polynomial functions. The NIST/SEMATECH e-Handbook contains a detailed discussion of these models at

4. CALIBRATION - This command is used to fit linear or quadratic calibration models.

5. YATES ANALYSIS - This command is used to fit full and fractional 2-level designs.

6. SPLINE FIT - This command is used for spline fits.

7. LOWESS SMOOTH - This command is used to fit locally-weighted least squares models.

8. ARMA - This command is used for fitting autoregressive/moving average time series models.

9. PRINCIPAL COMPONENTS - This LET subcommand can be used to reduce the number of indpendent variables in a multi-linear fit.

10. SMOOTH - This command is used for various types of smoothing.

11. INTERPOLATION - This LET subcommand performs cubic spline interpolation.

12. HERMITE INTERPOLATION - This LET subcommand performs Hermite interpolation.

These commands are documented separately.

Note:
If you want to suppress the output to files dpst1f.dat, dpst2f.dat, and dpst3f.dat, enter the command

SET FIT AUXILLARY FILES OFF
Note:
By default, the values written to dpst1f.dat, dpst2f.dat and dpst3f.dat are written using a Fortran E15.7 format (that is, exponential format with 7 significant digits). You can specify the number of signficant digits with the command

SET AUXILLARY FILES DECIMAL POINTS <value>

where the default is 7.

Default:
None
Synonyms:
None
Related Commands:
 FIT ITERATIONS = Sets the maximum number of iterations for the FIT command. FIT STANDARD DEVIATION = Sets the minimum standard deviation for the convergence criterion in the FIT command. PRED = A variable where predicted values are stored. RES = A variable where residuals are stored. RESSD = A parameter where the residual standard deviation is stored. RESDF = A parameter where the residual degrees of freedom is stored. REPSD = A parameter where the replication standard deviation is stored. REPDF = A parameter where the replication degrees of freedom is stored. LOFCDF = A parameter where the lack of fit cdf is stored. WEIGHTS = Sets the weights for the fit command. BIWEIGHT = Perform a biweight transformation. EXACT RATIONAL FIT = Perform an exact rational fit. CALIBRATION = Perform a linear or quadratic calibration fit. LOWESS = Perform a locally weighted least squares smoothing. BOOTSTRAP FIT = = Perform a linear or multi-linear fit based on the bootstrap. ORTHOGONAL DISTANCE FIT = = Perform an orthogonal distance fit (useful for errors-in-variables models). PRE-FIT = Perform a least squares pre-fit. SPLINE FIT = Perform a spline fit. SMOOTH = Perform a smoothing. ANOVA = Perform a fixed effects analysis of variance. MEDIAN POLISH = Perform a median polish. PLOT = Generate a data/function plot. 4-PLOT = Generate a 4-plot.
References:
Osborne (1972), "Some Aspects of Nonlinear Least Squares Calculation", in Numerical Methods for Nonlinear Optimization, Ed. Lootsma, Academic Press.

Osborne (1976), "Nonlinear Least Squares -- the Levenberg Algorithm Revisited", ANZIAM Journal, Vol. 19, No. 3, pp. 343-357.

Applications:
Least Squares Fitting
Implementation Date:
Pre-1987
1987/09: Support for weighted fits
1988/03: Save LOFCDF parameter
1991/09: Expand number of allowed independent variables from 15 to 5
1992/03: Write coefficient, coefficient sd, and t-value to dpst1f.dat
1992/03: Write coefficient, coefficient sd, and t-value to dpst1f.dat
1997/07: Print summary information if maximum iterations reached
2001/04: Print parameter covariance matrix to dpst3f.dat
2014/06: Option to suppress output to auxillary files
2019/04: Option to suppress output to auxillary files
Program 1:

. Step 1:   Read the data
.
SKIP 25
SKIP 0
.
. Step 2:   Perform the fit
.
SET WRITE DECIMALS 5
LET ALPHA = 0.15
LET A = 0.004
LET B = 0.01
FIT Y = EXP(-ALPHA*X)/(A+B*X)
.
. Step 3:   Generate diagonistic graphs
.
TITLE OFFSET 2
TITLE CASE ASIS
LABEL CASE ASIS
TITLE Predicted Values Overlaid on Raw Data (CHWIRUT1.DAT)
X1LABEL Metal Distance
Y1LABEL Ultrasonic Response
.
LINE BLANK SOLID
CHARACTER X BLANK
.
PLOT Y PRED VS X
.
LABEL
TITLE
SET 4-PLOT MULTIPLOT ON
MULTIPLOT CORNER COORDINATES 0 0 100 100
TIC MARK LABEL SIZE 4
CHARACTER SIZE 4
.
4-PLOT RES
.
JUSTIFICATION CENTER
MOVE 50 97
TEXT 4-Plot of Residuals (CHWIRUT1.DAT)

The following output is generated.
             Least Squares Non-Linear Fit

Sample Size:                                        214
Model: Y =EXP(-ALPHA*X)/(A+B*X)
Replication Case:
Replication Standard Deviation:                 3.28176
Replication Degrees of Freedom:                     192
Number of Distinct Subsets:                          22

----------------------------------------------------------------------------------------
Residual *
Iteration    Convergence       Standard *      Parameter
Number        Measure      Deviation *      Estimates
----------------------------------------------------------------------------------------
1  0.1000000E-01  0.1077871E+02 *   0.1500000E+00  0.4000000E-02  0.1000000E-01
2  0.5000000E-02  0.3721930E+01 *   0.1807460E+00  0.5554412E-02  0.1071653E-01
3  0.2500000E-02  0.3362018E+01 *   0.1905488E+00  0.6119125E-02  0.1051960E-01
4  0.1250000E-02  0.3361673E+01 *   0.1904515E+00  0.6133742E-02  0.1052492E-01

--------------------------------------------------------------------
Approximate
Final Parameter Estimates       Standard Deviation   t-Value
--------------------------------------------------------------------
1  ALPHA                     0.19041             0.02207    8.6266
2  A                         0.00613             0.00035   17.5593
3  B                         0.01053             0.00080   13.1131

Residual Standard Deviation:                    3.36167
Residual Degrees of Freedom:                        211
Replication Standard Deviation:                 3.28176
Replication Degrees of Freedom:                     192
Lack of Fit F Ratio:                            1.54740
Lack of Fit F CDF (%):                         92.64608
Lack of Fit Degrees of Freedom 1:                    19
Lack of Fit Degrees of Freedom 2:                   192


Program 2:

. Step 1:   Read the data
.
LET Q = X - SQRT(-109737.3/T)
.
. Step 2:   Perform the fit
.
SET WRITE DECIMALS 5
LET A = 0.2
LET B = -0.00005
LET C = 200
LET D = -123
.
CAPTURE SCREEN ON
CAPTURE FIT2.OUT
FIT Q = A - B*T - ATAN(C/(T-D))/3.14159
END OF CAPTURE
.
. Step 3:   Generate diagonistic graphs
.
TITLE OFFSET 2
TITLE CASE ASIS
LABEL CASE ASIS
TITLE Predicted Values Overlaid on Raw Data (ROSZMAN1.DAT)
X1LABEL Excited State Energy
Y1LABEL Quantum Effects for Sulfur I Atom
.
LINE BLANK SOLID
CHARACTER X BLANK
.
PLOT Q PRED VS T
.
LABEL
TITLE
SET 4-PLOT MULTIPLOT ON
MULTIPLOT CORNER COORDINATES 0 0 100 100
TIC MARK LABEL SIZE 4
CHARACTER SIZE 4
.
4-PLOT RES
.
JUSTIFICATION CENTER
MOVE 50 97
TEXT 4-Plot of Residuals (ROSZMAN1.DAT)

The following output is generated.
             Least Squares Non-Linear Fit

Sample Size:                                          25
Model: Q =A - B*T - ATAN(C/(T-D))/3.14159
No Replication Case:

-------------------------------------------------------------------------------------------------------
Residual *
Iteration    Convergence       Standard *      Parameter
Number        Measure      Deviation *      Estimates
-------------------------------------------------------------------------------------------------------
1  0.1000000E-01  0.2922875E+00 *   0.2000000E+00 -0.5000000E-04  0.2000000E+03 -0.1230000E+03
2  0.5000000E-02  0.6473232E-01 *   0.2938492E+00 -0.1997106E-04  0.7968499E+03  0.4821014E+03
3  0.2500000E-02  0.4032352E-01 *   0.1419884E+00 -0.4890395E-06  0.1872259E+04  0.1699836E+03
4  0.1265625E-01  0.3540118E-01 *   0.2300597E+00 -0.7388945E-05  0.7415686E+03 -0.2177707E+03
5  0.6328125E-02  0.7476144E-02 *   0.2339476E+00 -0.1048361E-04  0.1026235E+04 -0.1199718E+03
6  0.3164063E-02  0.5190951E-02 *   0.2057654E+00 -0.6787648E-05  0.1194028E+04 -0.1579526E+03
7  0.1582031E-02  0.4854277E-02 *   0.2019346E+00 -0.6193031E-05  0.1204886E+04 -0.1813351E+03
8  0.7910156E-03  0.4854238E-02 *   0.2019425E+00 -0.6191448E-05  0.1204564E+04 -0.1813955E+03

--------------------------------------------------------------------
Approximate
Final Parameter Estimates       Standard Deviation   t-Value
--------------------------------------------------------------------
1  A                         0.20194             0.01927   10.4816
2  B                        -0.00001             0.00000   -1.9250
3  C                      1204.55836            74.63673   16.1389
4  D                      -181.39214            49.88409   -3.6363

Residual Standard Deviation:                    0.00485
Residual Degrees of Freedom:                         21


Date created: 09/02/2021
Last updated: 12/04/2023