4.
Process Modeling
4.6. Case Studies in Process Modeling 4.6.2. Alaska Pipeline


Transformations 
In regression modeling, we often apply transformations
to achieve the following two goals:


Plot of Common Transformations to Obtain Homogeneous Variances 
The first step is to try transforming the
response variable to find a tranformation that
will equalize the variances. In practice,
the square root, ln, and
reciprocal transformations often work well for
this purpose. We will try these first.
In examining these plots, we are looking for the plot that shows the most constant variability across the horizontal range of the plot. This plot indicates that the ln transformation is a good candidate model for achieving the most homogeneous variances. 

Plot of Common Transformations to Linearize the Fit 
One problem with applying the above transformation is that
the plot indicates that a straightline fit will no longer
be an adequate model for the data. We address this problem
by attempting to find a transformation of the
predictor variable that will result in the most
linear fit. In practice, the square root, ln, and
reciprocal transformations often work well for
this purpose. We will try these first.
This plot shows that the ln transformation of the predictor variable is a good candidate model. 

BoxCox Linearity Plot 
The previous step can be approached more formally
by the use of the
BoxCox
linearity plot. The α
value on the x axis corresponding to
the maximum correlation value on the y axis indicates the
power transformation that yields the most linear
fit.
This plot indicates that a value of 0.1 achieves the most linear fit. In practice, for ease of interpretation, we often prefer to use a common transformation, such as the ln or square root, rather than the value that yields the mathematical maximum. However, the BoxCox linearity plot still indicates whether our choice is a reasonable one. That is, we might sacrifice a small amount of linearity in the fit to have a simpler model. In this case, a value of 0.0 would indicate a ln transformation. Although the optimal value from the plot is 0.1, the plot indicates that any value between 0.2 and 0.2 will yield fairly similar results. For that reason, we choose to stick with the common ln transformation. 

lnln Fit 
Based on the above plots, we choose to fit a lnln model.
Parameter Estimate Stan. Dev t Value B0 0.281384 0.08093 3.48 B1 0.885175 0.02302 38.46 Residual standard deviation = 0.168260 Residual degrees of freedom = 105 Lackoffit F statistic = 1.7032 Lackoffit critical value, F_{0.05,76,29} = 1.73Note that although the residual standard deviation is significantly lower than it was for the original fit, we cannot compare them directly since the fits were performed on different scales. 

Plot of Predicted Values 
The plot of the predicted values with the transformed data indicates a good fit. In addition, the variability of the data across the horizontal range of the plot seems relatively constant. 

6Plot of Fit 
Since we transformed the data, we need to check that all of the regression assumptions are now valid. The 6plot of the residuals indicates that all of the regression assumptions are now satisfied. 

Plot of Residuals 
In order to see more detail, we generate a fullsize plot of the residuals versus the predictor variable, as shown above. This plot suggests that the assumption of homogeneous variances is now met. 