 Dataplot Vol 1 Auxiliary Chapter

# ORTHOGONAL DISTANCE FIT

Name:
ORTHOGONAL DISTANCE FIT
Type:
Analysis Command
Purpose:
Estimate the parameters for an orthognal distance fit. Note that the orthogonal distance fit is also commonly referred to as errors in variables fitting.
Description:
The ordinary least squares model is:

y = f(x;beta)

where y is a response variable, f is a linear or non-linear function, x is a list of one or more independent (or factor) variables, and beta is a list of parameters in the function to be estimated. The least squares fit generates estimates for beta and predicted and residual values for y. You can also specify weights for the response variable y. Weighting is typically applied to give more weight to observations that are known to more precise.

In ordinary least squares fitting, the independent variables are assumed to be fixed (i.e., there is no measurement error). However, in many measurement processes, there can be significant error in the independent variables as well as the dependent variables. This is commonly referred to as the measurement error model or the errors in variables problem.

Orthogonal distance fitting provides one method for fitting these error in variables model. Dataplot supports orthogonal distance fitting using the ODRPACK library (see the References section below).

A mathematical description of orthogonal distance fitting is beyond the scope of this help file. We have placed a Postscript copy of the ODRPACK User's Guide on the Dataplot web site (see the References section below) for those who are interested in the mathematical details of orthogonal distance regression. This help file will concentrate on applying orthogonal distance fitting within Dataplot.

As mentioned above, ordinary least squares allows you to specify weights for the response variable and starting values for the parameters. It returns estimates for the model parameters (beta) and predicted and residual values for the response variable.

For orthogonal distance fitting, you can additionally specify the following:

1. You can specify which observations in the independent variables (also called the design matrix) are to be estimated and which are to remain fixed.

2. You can specify weights for the design matrix.

3. You can specify starting values for the estimated errors in the design matrix.

In addition to the estimated (i.e., predicted) response variable, orthogonal distance fitting returns an estimate for the design matrix. More specifically, it returns the residuals, DELTA, which are added to the original design matrix to obtain the estimated (i.e., predicted) design matrix.

These topics are discussed in more detail in the various "Note:" sections below.

Although errors in variables models were the primary motivation for incorporating ODRPACK into Dataplot, ODRPACK provides the following 2 additional capabilities:

1. Implicit models can be fit.

2. Multiple response variables can be fit for both linear and nonlinear models.

These topics are discussed in "Note:" sections below.

Syntax 1:
ORTHOGONAL DISTANCE FIT <y> = <f>
<SUBSET/EXCEPT/FOR qualification>
where <y> is the response (= dependent) variable; <f> is:
1. any general FORTRAN-like expression; or
2. any function name that the user has already created via the LET FUNCTION command;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

This syntax is appropriate for applying orthogonal distance fitting to linear, polynomial, multi-linear, and non-linear models.

Syntax 2:
ORTHOGONAL DISTANCE FIT <f>
<SUBSET/EXCEPT/FOR qualification>
where <f> is:
1. any general FORTRAN-like expression; or
2. any function name that the user has already created via the LET FUNCTION command;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

This syntax is appropriate for applying orthogonal distance fitting to implicit models.

Syntax 3:
ORTHOGONAL DISTANCE FIT <y1>... <yk> = <f1> ... <fk>
<SUBSET/EXCEPT/FOR qualification>
where <y1>... <yk> is a list of 2 to 5 response (= dependent) variables;
<f1> ... <fk> is a list of 2 to 5 function names (must equal the number of response variables;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

This syntax is used when there are multiple response variables.

NOTE: This syntax is still being tested.

Examples:
ORTHOGONAL DISTANCE FIT Y = A0 + A1*X1
ORTHOGONAL DISTANCE FIT Y = A0 + A1*X1 + A2*X1**2
ORTHOGONAL DISTANCE FIT Y = A0 + A1*X1 + A2*X2
ORTHOGONAL DISTANCE FIT Y = A0 + A1*X1 SUBSET X1 > 1
ORTHOGONAL DISTANCE FIT Y = A+B*EXP(-C*X)
Note:
As with least squares nonlinear fitting, providing good starting values for the model parameters can be important. In Dataplot, the LET command is used to provide starting values. This is demonstrated in the example programs below.

Starting values are often determined from previous fits to similar data. If this is not available, you may need to do some preliminary analysis to determine starting values. In Dataplot, the PRE-FIT command can often be useful for this purpose.

Note:
If there is a single response variable, you can specify weights for the response variable using the command

WEIGHTS <varname>

where is the name of the variable containing the weights.

See the Note below regarding multi-response fits for a discussion of how weights are specified for multi-response fits.

Note:
You can specify which observations in the design matrix are to treated as fixed (no measurement error) and which are to be estimated.

A common case is to specify which columns of the design matrix are to be fixed or estimated. For example, suppose there are three independent variables where the first and third are to be estimated while the second is fixed, then enter the commands

LET YERR = DATA 1 0 1
ORTHOGONAL DISTANCE ERROR YERR

That is, a single variable (it does not have to be called YERR) is specified and the number of rows must be equal to the number of columns in the design matrix. A zero indicates that the corresponding column is considered fixed and a non-zero (here, that means the abosolute value is greater than 0.5) means it it will be estimated. Note that order is important in the above command. That is, Dataplot creates a list of independent variables when it parses the function name in the ORTHOGONAL DISTANCE FIT command. This parsing is left to right, so the order of the values in the YERR variable is relative to the variable names as they are first encountered (left to right) in the function.

You can also specify which are fixed at the observation level. For example, suppose there are two independent variables with eight observations each. For the first variable, we want the first two observatons to be fixed and for the second variable we want the first four observations to be fixed. We would enter the commands

LET YERR1 = DATA 0 0 1 1 1 1 1 1
LET YERR2 = DATA 0 0 0 0 1 1 1 1
ORTHOGONAL DISTANCE ERROR YERR1 YERR2

As for the one variable case, order is important in the command. YERR1 applies to the first variable name encountered in the function (left to right), YERR2 corresponds to the second variable name encountered, and so on.

The default is to assume all values in the design matrix are to be estimated. In this case, the ORTHOGONAL DISTANCE ERROR command does not need to be entered. If you entered a previous ORTHOGONAL DISTANCE ERROR command, you can reset the default by entering

ORTHOGONAL DISTANCE ERROR

The general idea is to "fix" values in the independent variable that are known to be precise and to estimate values that have significant measurement errors. In most cases, sufficient precision will be determined at the variable level. However, there may be cases where a measurement for a given independent variable is known to be precise within a given range, but it may be error prone outside of that range.

Note:
You can specify weights for the observations in the design matrix. You can specify weights by columns of the design matrix. For example, suppose there are two independent variables where the first independent variable will be assigned a weight of 3 and the second independent variable will be assigned a weight of 5. You can enter the commands

LET RHO = DATA 3 5
ORTHOGONAL DISTANCE DELTA WEIGHTS RHO

That is, a single variable (it does not have to be called YERR) is specified and the number of rows must be equal to the number of columns in the design matrix. A zero indicates that the corresponding column is considered fixed and a non-zero (here, that means the abosolute value is greater than 0.5) means it it will be estimated. Note that order is important in the above command. That is, Dataplot creates a list of independent variables when it parses the function name in the ORTHOGONAL DISTANCE FIT command. This parsing is left to right, so the order of the values in the RHO variable is relative to the variable names as they are first encountered (left to right) in the function.

You can also specify weights at the observation level. For example, suppose there are two independent variables with eight observations each. The following shows an example of specifying individual weights.

LET RHO1 = DATA 1 1 2 2 2 2 1 1
LET RHO2 = DATA 1 1 1 1 2 2 2 2
ORTHOGONAL DISTANCE DELTA WEIGHTS RHO1 RHO2

As for the one variable case, order is important in the command. RHO1 applies to the first variable name encountered in the function (left to right), RHO2 corresponds to the second variable name encountered, and so on.

The default is the unweighted case. That is, all points in the design matrix will have a weight of 1. Note that if an observation or column has been designated as fixed, the weight is ignored. If you entered a previous ORTHOGONAL DISTANCE DELTA WEIGHT command, you can reset the default by entering

ORTHOGONAL DISTANCE DELTA WEIGHTS

The general idea is to provide greater weight to independent variables or observations that are known to be more precise.

Note:
You can specify starting values for the deltas of the observations in the design matrix. Remember that the deltas are essentially the residuals for the design matrix (i.e., they are added to the values in the design matrix to obtain the predicted value of the design matrix). By default, the deltas are set to zero and in most cases this is adequate.

You can specify starting values by columns of the design matrix. For example, suppose there are two independent variables where the first independent variable will be assigned a starting value of 2 and the second independent variable will be assigned a starting value of 7. You can enter the commands

LET DEL = DATA 2 7
ORTHOGONAL DISTANCE DELTA DEL

That is, a single variable (it does not have to be called DEL) is specified and the number of rows must be equal to the number of columns in the design matrix. A zero indicates that the corresponding column is considered fixed and a non-zero (here, that means the abosolute value is greater than 0.5) means it it will be estimated. Note that order is important in the above command. That is, Dataplot creates a list of independent variables when it parses the function name in the ORTHOGONAL DISTANCE FIT command. This parsing is left to right, so the order of the values in the RHO variable is relative to the variable names as they are first encountered (left to right) in the function.

You can also specify starting values at the observation level. For example, suppose there are two independent variables with eight observations each. The following shows an example of specifying individual weights.

LET DEL1 = DATA 0.5 0.5 0.5 0.5 2.0 2.0 2.0 2.0
LET DEL2 = DATA 0 0 0 0 1 1 1 1
ORTHOGONAL DISTANCE DELTA DEL1 DEL2

As for the one variable case, order is important in the command. DEL1 applies to the first variable name encountered in the function (left to right), DEL2 corresponds to the second variable name encountered, and so on.

Note:
You can define the maximum number of iterations for the fit with the following command (the default is 50).

FIT ITERATIONS <value>

You can define a number of convergence critierion. By default, these are chosen automatically by ODRPACK and we recommend that these default values be used unless you have a good reason for changing them. See the ODRPACK User's Guide for more information on these parameters.

• SET ORTHOGONAL DISTANCE TRUST REGION RADIUS <value> - defines the trust region radius

• SET ORTHOGONAL DISTANCE STOP TOLERANCE <value> - defines the stop tolerance for the sum of squares convergence

• SET ORTHOGONAL DISTANCE PARAMETER TOLERANCE <value> - defines the stopping tolerance for parameter convergence
Note:
ODRPACK allows several levels of output. Dataplot allows some control over this with the following command:

SET ORTHOGONAL DISTANCE PRINT OPTION <SHORT/INTERMEDIATE/FULL>

The default is INTERMEDIATE.

In addition, Dataplot writes the following information to files after an orthogonal distance fit:

1. dpst1f.dat - contains the final parameter estimates and their standard deviaitions.

2. dpst2f.dat - contains the parameter variance-covariance matrix. Written using a 20(E15,7,1X) format.

3. dpst3f.dat - the predicted values for the design matrix (i.e., "x + delta"). Written using a 20(E15.7,1X) format.

4. dpst4f.dat - the deltas (i.e., the residuals for the design matrix). Written using a 20(E15.7,1X) format.

In addition, the internal parameters RESSD and RESDF will contain the residual standard deviation and the residual degrees of freedom for the fitted model.

Note:
Dataplot currently imposes the following limits on the size of problem that can be handled with the ORTHOGONAL DISTANCE FIT command.

1. The maximum number of observations is one half the maximum row size. That is, the default version of Dataplot allows a maximum of 20,000 rows for your data. The ORTHOGONAL DISTANCE FIT then allows a maximum number of observations of 10,000.

You can redimension the Dataplot data space to have fewer rows and more columns. The limit on the number of observations for the ORTHOGONAL DISTANCE FIT command is relative to the allowable maximum, not the number of rows you may have set using the DIMENSION command.

2. The maximum number of independent variables in your function is 20. The maximum number of parameters in the model is 100. The maximum number of characters in the function is 1,000.
3. ODRPACK requires the following number of words of scratch space:

18 + 11*NP + NP**2 + M + M**2 + 4*N*NQ + 6*N*M + 2*N*NQ*M + 2*N*NQ*NP + NQ**2 5*NQ + NQ*(NP+M) + (N*1)*NQ

where NP is the number of parameters in the model, M is the number of independent variables, NQ is the number of response variables, and N is the number of observations.

Dataplot provides 23*MAXOBV words of scratch space. An error message is printed if Dataplot does not have enough storage.

Note:
Although Dataplot uses the double precision version of ODRPACK, Dataplot function evaluation returns single precision results. For this reason, Dataplot uses the single precision convergence critierion.

On most platforms, Dataplot can be compiled in a mode where single precision is treated as double precision (so Dataplot function evaluation returns double precision results). If you desire higher precision results from the orthogonal distance regression, you should install a double precision version of Dataplot on your system. Contact Alan Heckert for additional information.

For NIST users, we maintain a double precision version on the Sun that can be cross-mounted from the /itl/apps directory. If you have cross-mounted this directory, enter

dataplot.dp

to run the double precision version.

Note:
An explicit function is defined as:

y = f(xi;beta)

An implicit function is defined as:

f(xi,y) = 0

The ODRPACK software can fit implicit models and Dataplot supports this capability. See Syntax 2 above and the Program 2 example below.

Basically, the response variable is omitted. The other options described above work the same for the implicit model as for the explicit model.

Note:
ODRPACK allows models that have multiple response variables. We are currently working to make this capability available within Dataplot. This is still in the development/testing phase.

Syntax 3 shows the basic syntax for the multi-response case. The primary point is that a function is specified for each response variable (you must give the name of a previously defined function, not a functional expression).

The other difference is that you can specify a weight variable for each response variable. Use the command

ORTHOGONAL DISTANCE Y WEIGHTS <var-list>

where <var-list> is a list of variables that define the weights for each of the response variables (i.e., if there are 3 response variables, there should be a list of 3 weight variables).

Note that for the single response variable case, you can specify the weights either with this command or with the WEIGHTS command. If both are given, the ORTHOGONAL DISTANCE Y WEIGHTS command takes precedence.

Note:
ODRPACK accepts the model parameters as an array of ordered values, not by parameter name. Note that Dataplot passes the parameters in the order that they are encountered in the function (from left to right). Keep this in mind when interpreting the ODRPACK output.
Note:
The ODRPACK library provides for considerable flexibility in performing the orthogonal distance fit. It was designed to support anything from the simplest fits to the most complex fits. Although the above notes documented a significant number of options, there are additional options that Dataplot does not support.

1. ODRPACK allows the user to specify both the function to be fit and analytic Jacobians (relative to the parameters and to the deltas).

In Dataplot, you are limited to functions that can be computed using the LET FUNCTION command. Although this provides a relatively rich set of functions, you may well have functions that cannot be stated using the Dataplot LET FUNCTION command.

Dataplot does not currently support analytic Jacobians. It specifies that ODRPACK should compute the partial derivatives using a central finite difference method. ODRPACK also allows the use of a forward finite difference method for computing the partial derivatives (the forward finite difference method is faster, but somewhat less accurate). Dataplot always uses the central difference method.

2. For the multi-response case, the weights for both the response variable and the independent variables allow a covariance structure to be implemented for the weight function. This is not currently supported in Dataplot. See the ODRPACK User's Guide for details.

3. ODRPACK allows the scaling to be specified for both the parameters and the deltas (i.e., the residuals of the independent variables). Dataplot utilizes the default ODRPACK scaling algorithms.

4. ODRPACK allows the user to specify the number of reliable decimal digits in the function evaluations. Dataplot utilizes the default ODRPACK value.

5. ODRPACK allows relative step sizes to be specified relative to both the parameters and the deltas when computing partial derivatives using the forward finite difference method. Dataplot utilizes the default ODRPACK values.

6. ODRPACK allows constraints to be coded for the model parameters. Basically, ODRPACK will try to take a step closer to the previous value if an out of range value is encountered. Dataplot does not support the specification of constraints in the initial implementation, but this is being considered for future development.

If you need to run an orthogonal distance fit that is beyond Dataplot's capabilities, we recommend you download the ODRPACK Fortran source and use ODRPACK directly.

Note:
Dataplot uses version 2.01 of ODRPACK. This is the current (and apparently final) version of ODRPACK.
Default:
None
Synonyms:
The following are synonyms for ORTHOGONAL DISTANCE FIT:

ORTHOGONAL DISTANCE REGRESSION
ERRORS IN VARIABLES FIT
ERRORS IN VARIABLES REGRESSION
Related Commands:
 FIT ITERATIONS = Sets the maximum number of iterations for the fit and orthogonal distance fit commands. WEIGHTS = Sets the weights for the dependent variable for the fit and orthogonal distance fit commands. PRED = A variable where predicted values are stored. RES = A variable where residuals are stored. RESSD = A parameter where the residual standard deviation is stored. RESDF = A parameter where the residual degrees of freedom is stored. FIT = Carries out linear/nonlinear least squares fit. EXACT RATIONAL FIT = Carries out an exact rational fit. PRE-FIT = Carries out a least squares pre-fit. SPLINE FIT = Carries out a spline fit. PLOT = Generates a data/function plot.
References:
"Algorithm 676: ODRPACK: Software for Weighted Orthogonal Distance Regression", Paul Boggs, Janet Donaldson, Richard Byrd, and Robert Schnabel, ACM Transactions on Mathematical Software, December, 1989, Volume 15, Number 4, pp. 348-364.

"User's Reference Guide for ODRPACK Version 2.01: Software for Weighted Orthogonal Distance Regression", Paul Boggs, Janet Donaldson, Richard Byrd, and Robert Schnabel, NIST-IR 89-4103, Revised).

Note: for those interested in the mathematics of orhogonal distance regression, we have put a Postscript copy of the ODRPACK User's Guide on the Dataplot web site at:

http://www.itl.nist.gov/div898/software/dataplot/refman1/odrpack_guide.ps
Applications:
Fitting where there can be significiant error in the independent variables
Implementation Date:
2001/5
Program 1:
. FILE: FULLODR1.DP
. Performs an orthogonal distance analysis of example from
. Wayne Fuller's book (see header of FULLODR1.DAT for reference).
. This is example 1 from version 2.01 of ODRPACK User's Guide.
.
skip 25
let n = size y
.
let b1 = 1500.0
let b2 = -50.0
let b3 = -0.1
let function f = b1 + b2*(exp(b3*x) - 1)**2
.
let yerr = 0 for i = 1 1 n
let yerr = 1 subset x = 1 to 99
.
orthogonal distance error yerr
.
orthogonal distance fit y = f
Program 2:
. FILE: FULLODR2.DP
. Performs an orthogonal distance analysis of example from
. Wayne Fuller's book (see header of FULLODR2.DAT for reference).
. This is example 2 from version 2.01 of ODRPACK User's Guide.
.
. Note that this is an example of fitting an implicit function.
.
skip 25
let n = size v
.
let b1 = -1.0
let b2 = -3.0
let b3 = 0.09
let b4 = 0.02
let b5 = 0.08
let function f1 = b3*(v-b1)**2
let function f2 = 2*b4*(v-b1)*(h-b2)
let function f3 = b5*(h-b2)**2
let function f = f1 + f2 + f3 - 1.0
.
orthogonal distance fit f
Program 3:
. FILE: DRAPS521.DP
. Performs an orthogonal distance analysis of Draper/Smith
. data set (p. 521 of 2nd. ed.). From version 1.3 of ODRPACK
. User's Guide.
.
skip 25
.
let a1 = 0.01155
let a2 = 5000.0
let function f = exp(-a1*x1*exp(-a2*(1/x2 - 1/620)))
.
let rho = data 5.0 3.0
let yerr = data 1 0
.
orthogonal distance error yerr
orthogonal distance delta weights rho
.
orthogonal distance fit y = f
Program 4:
. FILE: DRAPS518.DP
. Performs an orthogonal distance analysis of Draper/Smith
. data set (p. 518 of 2nd. ed.). From ODRPACK ACM article.
.
skip 25
.
let b1 = 4
let b2 = 5
let b3 = 200
let function f = B1*10**(b2*x/(b3+x))
.
let yerr = data 1
.
orthogonal distance error yerr
.
orthogonal distance fit y = f

Date created: 6/5/2001
Last updated: 4/4/2003