|
4.
Process Modeling
4.6. Case Studies in Process Modeling 4.6.2. Alaska Pipeline
|
|||
| Weighting | Another approach when the assumption of constant standard deviation of the errors (i.e. homogeneous variances) is violated is to perform a weighted fit. In a weighted fit, we give less weight to the less precise measurements and more weight to more precise measurements when estimating the unknown parameters in the model. | ||
| Fit for Estimating Weights |
For the pipeline data, we chose
approximate replicate groups
so that each group has four observations (the last
group only has three). This was done by first
sorting the data by the predictor variable and then
taking four points in succession to form each
replicate group.
Using the power function model with the data for estimating the weights, Dataplot generated the following output for the fit of ln(variances) against ln(means) for the replicate groups. The output has been edited slightly for display. LEAST SQUARES MULTILINEAR FIT SAMPLE SIZE N = 27 NUMBER OF VARIABLES = 1 NO REPLICATION CASE PARAMETER ESTIMATES (APPROX. ST. DEV.) T VALUE 1 A0 -3.18451 (0.8265 ) -3.9 2 A1 XTEMP 1.69001 (0.2344 ) 7.2 RESIDUAL STANDARD DEVIATION = 0.8561206460 RESIDUAL DEGREES OF FREEDOM = 25 The fit output and plot from the replicate variances against the replicate means shows that the a linear fit provides a reasonable fit with an estimated slope of 1.69. Note that this data set has a small number of replicates, so you may get a slightly different estimate for the slope. For example, S-PLUS generated a slope estimate of 1.52. This is caused by the sorting of the predictor variable (i.e., where we have actual replicates in the data, different sorting algorithms may put some observations in different replicate groups). In practice, any value for the slope, which will be used as the exponent in the weight function, in the range 1.5 to 2.0 is probably reasonable and should produce comparable results for the weighted fit. We used an estimate of 1.5 for the exponent in the weighting function. |
||
| Residual Plot for Weight Function |
The residual plot from the fit to determine an appropriate weighting function reveals no obvious problems. |
||
| Numerical Output from Weighted Fit |
Dataplot generated the following output for the weighted fit of the
model that relates the field measurements to the lab measurements
(edited slightly for display).
LEAST SQUARES MULTILINEAR FIT SAMPLE SIZE N = 107 NUMBER OF VARIABLES = 1 REPLICATION CASE REPLICATION STANDARD DEVIATION = 0.6112687111D+01 REPLICATION DEGREES OF FREEDOM = 29 NUMBER OF DISTINCT SUBSETS = 78 PARAMETER ESTIMATES (APPROX. ST. DEV.) T VALUE 1 A0 2.35234 (0.5431 ) 4.3 2 A1 LAB 0.806363 (0.2265E-01) 36. RESIDUAL STANDARD DEVIATION = 0.3645902574 RESIDUAL DEGREES OF FREEDOM = 105 REPLICATION STANDARD DEVIATION = 6.1126871109 REPLICATION DEGREES OF FREEDOM = 29This output shows a slope of 0.81 and an intercept term of 2.35. This is compared to a slope of 0.73 and an intercept of 4.99 in the original model. |
||
| Plot of Predicted Values |
The plot of the predicted values with the data indicates a good fit. |
||
| Diagnostic Plots of Weighted Residuals |
We need to verify that the weighting did not result in the other regression assumptions being violated. A 6-plot, after weighting the residuals, indicates that the regression assumptions are satisfied. |
||
| Plot of Weighted Residuals vs Lab Defect Size |
In order to check the assumption of homogeneous variances for the errors in more detail, we generate a full sized plot of the weighted residuals versus the predictor variable. This plot suggests that the errors now have homogeneous variances. |
||