1. Exploratory Data Analysis
1.3. EDA Techniques
1.3.5. Quantitative Techniques
1.3.5.18. Yates Algorithm

## Defining Models and Prediction Equations

For Orthogonal Designs, Parameter Estimates Don't Change as Additional Terms Are Added In most cases of least-squares fitting, the model coefficients for previously added terms change depending on what was successively added. For example, the X1 coefficient might change depending on whether or not an X2 term was included in the model. This is not the case when the design is orthogonal, as is a 23 full factorial design. For orthogonal designs, the estimates for the previously included terms do not change as additional terms are added. This means the ranked list of parameter estimates are the least-squares coefficient estimates for progressively more complicated models.
Example Prediction Equation We use the parameter estimates derived from a least-squares analysis for the eddy current data set to create an example prediction equation.
     Parameter    Estimate
---------    --------
Mean          2.65875
X1            1.55125
X2           -0.43375
X1*X2         0.06375
X3            0.10625
X1*X3         0.12375
X2*X3         0.14875
X1*X2*X3      0.07125


A prediction equation predicts a value of the reponse variable for given values of the factors. The equation we select can include all the factors shown above, or it can include a subset of the factors. For example, one possible prediction equation using only two factors, X1 and X2, is:

$$\hat{Y} = 2.65875 + 1.55125 \cdot X_1 - 0.43375 \cdot X_2$$

The least-squares parameter estimates in the prediction equation reflect the change in response for a one-unit change in the factor value. To obtain "full" effect estimates (as computed using the Yates algorithm) for the change in factor levels from -1 to +1, the effect estimates (except for the intercept) would be multiplied by two.

Remember that the Yates algorithm is just a convenient method for computing effects, any statistical software package with least-squares regression capabilities will produce the same effects as well as many other useful analyses.

Model Selection We want to select the most appropriate model for our data while balancing the following two goals.
1. We want the model to include all important factors.
2. We want the model to be parsimonious. That is, the model should be as simple as possible.
Note that the residual standard deviation alone is insufficient for determining the most appropriate model as it will always be decreased by adding additional factors. The next section describes a number of approaches for determining which factors (and interactions) to include in the model.