FIT

Name:

... FIT Type:

Analysis Command Purpose:

Estimate the parameters for a linear, polynomial, or non-linear least squares fit. This is one of Dataplot's most powerful and heavily used commands. Description:

Non-linear fits are performed using an iterative modified Levenberg-Marquardt algorithm (Dataplot implements the algorithm given in the Osborne paper listed in the References section below). This algorithm can fit linear and multi-linear models as well as non-linear models.

In addition, the FIT command can perform linear and polynomoal fits using a non-iterative algorithm. Since the non-iterative algorithm supports a much broader range of output, this will be documented separately, Enter

HELP LINEAR FIT

for the documentation for exact linear fits.

Non-linear fits are specified by entering a function. For example,

The function can either be given on the FIT command or be defined with a LET FUNCTION command.

For non-linear fits, the FIT command generates the following output.

The parameter estimates and associated standard deviations are printed for each iteration.
After convergence, a table containing the parameter estimates, the parameter standard deviations, and the parameter t-values is printed. The t-value is used to determine if a given paramater is statistically significant.
These values are also written to the file dpst1f.dat. To read these values into Dataplot variables, enter the command
The correlation matrix for the parameter estimates is written to the file dpst2f.dat. To read this correlation matrix, enter the command
The variance-covariance matrix for the parameter estimates is written to the file dpst3f.dat. To read this covariance matrix, enter the command
The residual standard deviation and its corresponding degrees of freedom are are stored in the parameters RESSD and RESDF, respectively. RESDF is the number of observations minus the number of independent variables in the fit (including the constant term). The formula for RESSD is:
If there is replication in the independent variables, the replication standard deviation and corresponding degrees of freedom are printed. In addition, a lack of fit F test is performed. These are stored in the parameters REPDF, REPSD, and LOFCDF respectively. The formulas are:
with \( nrep \) and \( n_i \) denoting the number of replications and the number of observations in the i-th replication, respectively.
with \( \bar{Y}_{i} \) denoting the mean of the i-th replication.
Dataplot saves the predicted values from a fit in the variable PRED and the residual values in the variable RES. These variables can be used in subsequent LET and PLOT commands to generate diagnostic plots of residuals and predicted values.

It is recommended that a FIT be followed by a residual analysis to assess the model adequacy. Specifically, the typical assumptions for the residuals are that they are independent with a common distribution having fixed location and variation. It is usually assumed that the common distribution is a normal distribution. The 4-PLOT command generates 4 plots that are useful in testing these assumptions. The NIST/SEMATECH e-Handbook contains a more detailed discussion of this issue at

https://www.itl.nist.gov/div898/handbook/eda/section2/eda2.htm

In addition, if there is a single independent variable in the model, it can be useful to plot the data with the fitted values overlaid.

For non-linear fits, up to 15 indepedent variables can be included in the model.

Syntax:

a general Fortran-like expression; or
any function name that the user has already created via the LET FUNCTION command;

This syntax is appropriate for all models--linear, polynomial, multi-linear (up to 15 independent variables), and non-linear (up to 15 independent variables). It uses an iterative modified Levenberg-Marquardt algorithm. Linear fits are handled as a special case (the fits are still done iteratively).

Examples:

FIT Y = F1

Note:

Techniques for the Fitting and Verification of Linear/NonLinear Models using Dataplot

Note:

By default, a maximum of 50 iterations are allowed before Dataplot assumes the fit is not converging. You can change this maximum with the FIT ITERATIONS command.

Dataplot checks for convergence by computing the ratio of sccuessive values of the residual standard deviation. You can specify the critierion for convergence with the FIT STANDARD DEVIATION command.

Note:

However, decent starting values can often speed up non-linear fits. In addition, some fits may require good starting values in order to converge to accurate values.

To specify starting values, simply assign values to the coefficients before doing the fit. For example:

In some cases, good starting values might be known from previous work or from theoretical considerations. However, if better starting values are needed and reasonable guesses are not available, the PRE-FIT command can be helpful.

Sinusoidal models are one case where good starting values are needed. See the following example from the NIST/SEMATECH e-Handbook for an example of fitting this kind of model

https://www.itl.nist.gov/div898/handbook/eda/section4/eda425.htm

If you have a parameter in the model that you want to set to a fixed value, then enter the literal value or use the substitution character "^". For example

LET C = 1.5
FIT Y = A0 + A1*X**C - Dataplot will try to fit C
FIT Y = A0 + A1*X**^C - Dataplot will leave C fixed at 1.5

Note:

Weighting is one approach for dealing with non-constant variation in the residuals. It is not uncommon for the variance of the residuals to increase for the largest (or smallest) values of the independent variable. In this case, weights can be used to give less weight to the less precise measurements. The NIST/SEMATECH e-Handbook contains a disucssion of weighted fits and an example of using weights to address non-constant variation in the following pages
Weights can also used to implement certain types of robust fitting. In this case, weights are used to down weight observations based on the size of the associated residual. Outlier observations can sometimes distort a fit (i.e., in trying to fit the outlier point(s), the bulk of the data is poorly fit). Weighting based on the residuals can often provide a good fit to the bulk of the data without eliminating the outlier observations from the analysis.
Enter HELP WEIGHTS and HELP BIWEIGHT for examples of this use of weighted fits in Dataplot.

To specify weights for a least squares fit, enter the command

WEIGHTS <var>

where <var> is a variable containing the weights.

Note that the RES variable contains the absolute value of the residuals after the fit. For residual plots and analysis, it may be preferrable to work with the weighted residuals. You can create this with the command

LET RESW = W*RES

where W contains the weight variable.

Note:

https://www.itl.nist.gov/div898/handbook/pmd/section4/pmd452.htm

Data transformations can be generated easily if needed via the LET command. The BOX-COX LINEARITY PLOT can be a useful command for determining an approriate transformation.

Some analysts prefer to standardize the indpendent variables and the dependent variable by subtracting the mean and dividing by the standard deviation. This is done to provide numerical stability (note that Dataplot scales the data internally before performing the regression calculations) and also so that the data and regression coefficients are on a common scale. The original regression and standardized model are related as follows

\( y_{i}^{'} = \frac{y_{i} - \bar{y}}{s_{y}} \)

with \( \bar{x} \) and \( s_x \) denoting the mean and standard deviation of the independent variable and \( \bar{y} \) and \( s_y \) denoting the mean and standard deviation of the dependent variable.

The parameters are related by

\( \beta_{0}^{'} = \bar{y} - \beta_{1} \bar{x}_1 - \ldots - \beta_{p} \bar{x}_p \)

A variation on this is the correlation transformation (also called the standardized regression model). Specifically

\( x_{ik}^{'} = \frac{1}{\sqrt{n-1}} \frac{x_{ik} - \bar{x}_{k}} {s_{x_k}} \)

With this transformation, the \( X'X \) matrix reduces to a correlation matrix of the independent variables. If there are \( p \) independent variables, these transformations can be generated with the commands

 
LET N = SIZE Y
LET FACT = 1/SQRT(N-1)
LOOP FOR K = 1 1 P
    LET Z^K = STANDARDIZE X^K
    LET Z^K = AFACT*Z^K
END OF LOOP

LET YT = STANDARDIZE Y
LET YT = AFACT*YT

Note:

ORTHOGONAL DISTANCE FIT - This command is used to fit errors-in-variables models for for both linear and non-linear models. It can also fit implicit models.
BOOTSTRAP FIT - This command is used to fit linear or multilinear models using the bootstrap.
EXACT RATIONAL FIT - This command is used to determine good starting values for fitting rational function models (the full model is still fit using the FIT command).
Rational function models are the ratio of two polynomial functions. The NIST/SEMATECH e-Handbook contains a detailed discussion of these models at
CALIBRATION - This command is used to fit linear or quadratic calibration models.
YATES ANALYSIS - This command is used to fit full and fractional 2-level designs.
SPLINE FIT - This command is used for spline fits.
LOWESS SMOOTH - This command is used to fit locally-weighted least squares models.
ARMA - This command is used for fitting autoregressive/moving average time series models.
PRINCIPAL COMPONENTS - This LET subcommand can be used to reduce the number of indpendent variables in a multi-linear fit.
SMOOTH - This command is used for various types of smoothing.
INTERPOLATION - This LET subcommand performs cubic spline interpolation.
HERMITE INTERPOLATION - This LET subcommand performs Hermite interpolation.

These commands are documented separately.

Note:

SET FIT AUXILLARY FILES OFF

Note:

SET AUXILLARY FILES DECIMAL POINTS <value>

where the default is 7.

Default:

None Synonyms:

None Related Commands:

FIT ITERATIONS	=	Sets the maximum number of iterations for the FIT command.
FIT STANDARD DEVIATION	=	Sets the minimum standard deviation for the convergence criterion in the FIT command.
PRED	=	A variable where predicted values are stored.
RES	=	A variable where residuals are stored.
RESSD	=	A parameter where the residual standard deviation is stored.
RESDF	=	A parameter where the residual degrees of freedom is stored.
REPSD	=	A parameter where the replication standard deviation is stored.
REPDF	=	A parameter where the replication degrees of freedom is stored.
LOFCDF	=	A parameter where the lack of fit cdf is stored.
WEIGHTS	=	Sets the weights for the fit command.
BIWEIGHT	=	Perform a biweight transformation.
EXACT RATIONAL FIT	=	Perform an exact rational fit.
CALIBRATION	=	Perform a linear or quadratic calibration fit.
LOWESS	=	Perform a locally weighted least squares smoothing.
BOOTSTRAP FIT	=	= Perform a linear or multi-linear fit based on the bootstrap.
ORTHOGONAL DISTANCE FIT	=	= Perform an orthogonal distance fit (useful for errors-in-variables models).
PRE-FIT	=	Perform a least squares pre-fit.
SPLINE FIT	=	Perform a spline fit.
SMOOTH	=	Perform a smoothing.
ANOVA	=	Perform a fixed effects analysis of variance.
MEDIAN POLISH	=	Perform a median polish.
PLOT	=	Generate a data/function plot.
4-PLOT	=	Generate a 4-plot.

References:

Academic Press

Osborne (1976), "Nonlinear Least Squares -- the Levenberg Algorithm Revisited", ANZIAM Journal, Vol. 19, No. 3, pp. 343-357.

Applications:

Least Squares Fitting Implementation Date:

Program 1:

 
. Step 1:   Read the data
.
SKIP 25
READ CHWIRUT1.DAT Y X
SKIP 0
.
. Step 2:   Perform the fit
.
SET WRITE DECIMALS 5
LET ALPHA = 0.15
LET A = 0.004
LET B = 0.01
FIT Y = EXP(-ALPHA*X)/(A+B*X)
.
. Step 3:   Generate diagonistic graphs
.
TITLE OFFSET 2
TITLE CASE ASIS
LABEL CASE ASIS
TITLE Predicted Values Overlaid on Raw Data (CHWIRUT1.DAT)
X1LABEL Metal Distance
Y1LABEL Ultrasonic Response
.
LINE BLANK SOLID
CHARACTER X BLANK
.
PLOT Y PRED VS X
.
LABEL
TITLE
SET 4-PLOT MULTIPLOT ON
MULTIPLOT CORNER COORDINATES 0 0 100 100
TIC MARK LABEL SIZE 4
CHARACTER SIZE 4
.
4-PLOT RES
.
JUSTIFICATION CENTER
MOVE 50 97
TEXT 4-Plot of Residuals (CHWIRUT1.DAT)

             Least Squares Non-Linear Fit
  
 Sample Size:                                        214
 Model: Y =EXP(-ALPHA*X)/(A+B*X)
 Replication Case:
 Replication Standard Deviation:                 3.28176
 Replication Degrees of Freedom:                     192
 Number of Distinct Subsets:                          22
  
  
 ----------------------------------------------------------------------------------------
                                 Residual *
  Iteration    Convergence       Standard *      Parameter
     Number        Measure      Deviation *      Estimates
 ----------------------------------------------------------------------------------------
          1  0.1000000E-01  0.1077871E+02 *   0.1500000E+00  0.4000000E-02  0.1000000E-01
          2  0.5000000E-02  0.3721930E+01 *   0.1807460E+00  0.5554412E-02  0.1071653E-01
          3  0.2500000E-02  0.3362018E+01 *   0.1905488E+00  0.6119125E-02  0.1051960E-01
          4  0.1250000E-02  0.3361673E+01 *   0.1904515E+00  0.6133742E-02  0.1052492E-01
  
  
 --------------------------------------------------------------------
                                                Approximate
         Final Parameter Estimates       Standard Deviation   t-Value
 --------------------------------------------------------------------
   1  ALPHA                     0.19041             0.02207    8.6266
   2  A                         0.00613             0.00035   17.5593
   3  B                         0.01053             0.00080   13.1131
  
  
 Residual Standard Deviation:                    3.36167
 Residual Degrees of Freedom:                        211
 Replication Standard Deviation:                 3.28176
 Replication Degrees of Freedom:                     192
 Lack of Fit F Ratio:                            1.54740
 Lack of Fit F CDF (%):                         92.64608
 Lack of Fit Degrees of Freedom 1:                    19
 Lack of Fit Degrees of Freedom 2:                   192

plot generated by sample program

Program 2:

 
. Step 1:   Read the data
.
READ ROSZMAN1.DAT X T
LET Q = X - SQRT(-109737.3/T)
.
. Step 2:   Perform the fit
.
SET WRITE DECIMALS 5
LET A = 0.2
LET B = -0.00005
LET C = 200
LET D = -123
.
CAPTURE SCREEN ON
CAPTURE FIT2.OUT
FIT Q = A - B*T - ATAN(C/(T-D))/3.14159
END OF CAPTURE
.
. Step 3:   Generate diagonistic graphs
.
TITLE OFFSET 2
TITLE CASE ASIS
LABEL CASE ASIS
TITLE Predicted Values Overlaid on Raw Data (ROSZMAN1.DAT)
X1LABEL Excited State Energy
Y1LABEL Quantum Effects for Sulfur I Atom
.
LINE BLANK SOLID
CHARACTER X BLANK
.
PLOT Q PRED VS T
.
LABEL
TITLE
SET 4-PLOT MULTIPLOT ON
MULTIPLOT CORNER COORDINATES 0 0 100 100
TIC MARK LABEL SIZE 4
CHARACTER SIZE 4
.
4-PLOT RES
.
JUSTIFICATION CENTER
MOVE 50 97
TEXT 4-Plot of Residuals (ROSZMAN1.DAT)

             Least Squares Non-Linear Fit
  
 Sample Size:                                          25
 Model: Q =A - B*T - ATAN(C/(T-D))/3.14159
 No Replication Case:
  
  
 -------------------------------------------------------------------------------------------------------
                                 Residual *
  Iteration    Convergence       Standard *      Parameter
     Number        Measure      Deviation *      Estimates
 -------------------------------------------------------------------------------------------------------
          1  0.1000000E-01  0.2922875E+00 *   0.2000000E+00 -0.5000000E-04  0.2000000E+03 -0.1230000E+03
          2  0.5000000E-02  0.6473232E-01 *   0.2938492E+00 -0.1997106E-04  0.7968499E+03  0.4821014E+03
          3  0.2500000E-02  0.4032352E-01 *   0.1419884E+00 -0.4890395E-06  0.1872259E+04  0.1699836E+03
          4  0.1265625E-01  0.3540118E-01 *   0.2300597E+00 -0.7388945E-05  0.7415686E+03 -0.2177707E+03
          5  0.6328125E-02  0.7476144E-02 *   0.2339476E+00 -0.1048361E-04  0.1026235E+04 -0.1199718E+03
          6  0.3164063E-02  0.5190951E-02 *   0.2057654E+00 -0.6787648E-05  0.1194028E+04 -0.1579526E+03
          7  0.1582031E-02  0.4854277E-02 *   0.2019346E+00 -0.6193031E-05  0.1204886E+04 -0.1813351E+03
          8  0.7910156E-03  0.4854238E-02 *   0.2019425E+00 -0.6191448E-05  0.1204564E+04 -0.1813955E+03
  
  
 --------------------------------------------------------------------
                                                Approximate
         Final Parameter Estimates       Standard Deviation   t-Value
 --------------------------------------------------------------------
   1  A                         0.20194             0.01927   10.4816
   2  B                        -0.00001             0.00000   -1.9250
   3  C                      1204.55836            74.63673   16.1389
   4  D                      -181.39214            49.88409   -3.6363
  
  
 Residual Standard Deviation:                    0.00485
 Residual Degrees of Freedom:                         21

plot generated by sample program