4.6. Case Studies in Process Modeling
4.6.2. Alaska Pipeline
|Plot of Raw Data||
As with any regression problem, it is always a good idea
to plot the raw data first. The following is a
of the raw data.
This scatter plot shows that a straight line fit is a good initial candidate model for these data.
|Plot by Batch||
These data were collected in six distinct batches. The
first step in the analysis is to determine if there is a
In this case, the scientist was not inherently interested in the batch. That is, batch is a nuisance factor and, if reasonable, we would like to analyze the data as if it came from a single batch. However, we need to know that this is, in fact, a reasonable assumption to make.
We first generate a
where we condition on the batch.
This conditional plot shows a scatter plot for each of the six batches on a single page. Each of these plots shows a similar pattern.
|Linear Correlation and Related Plots||
We can follow up the conditional plot with
slope plot, and
residual standard deviation plot.
These four plots show the correlation, the intercept and
slope from a linear fit, and the residual standard deviation
for linear fits applied to each batch. These plots show
how a linear fit performs across the six batches.
The linear correlation plot (upper left), which shows the correlation between field and lab defect sizes versus the batch, indicates that batch six has a somewhat stronger linear relationship between the measurements than the other batches do. This is also reflected in the significantly lower residual standard deviation for batch six shown in the residual standard deviation plot (lower right), which shows the residual standard deviation versus batch. The slopes all lie within a range of 0.6 to 0.9 in the linear slope plot (lower left) and the intercepts all lie between 2 and 8 in the linear intercept plot (upper right).
|Treat Batch as Homogeneous||
These summary plots, in conjunction with the conditional plot
above, show that treating the data as a single batch
is a reasonable assumption to make. None of the
batches behaves badly compared to the others and none
of the batches requires a significantly different
fit from the others.
These two plots provide a good pair. The plot of the fit statistics allows quick and convenient comparisons of the overall fits. However, the conditional plot can reveal details that may be hidden in the summary plots. For example, we can more readily determine the existence of clusters of points and outliers, curvature in the data, and other similar features.
Based on these plots we will ignore the batch variable for the remaining analysis.