5.
Process Improvement
5.5.
Advanced topics
5.5.9.
An EDA approach to experimental design

Purpose

The effects plot answers the question:
What are the important factors (including interactions)?
Quantitatively, the question as to what is the estimated effect of
a given factor or interaction and what is its rank
relative to other factors and interactions is answered via the
least squares estimation
criterion (that is, forming effect estimates that minimize the
sum of the squared differences between the raw data and the
fitted values from such estimates). Based on such an estimation
criterion, one could then construct a tabular list of the factors
and interactions ordered by the effect magnitude.
The effects plot provides a graphical representation of these
ordered estimates, Paretostyle from largest to smallest.
The effects plot, as presented here, yields both of the above: the
plot itself, and the ranked list table. Further, the plot also
presents auxiliary confounding information, which is
necessary in forming valid conclusions for fractional factorial
designs.

Output

The output of the effects plot is:
 Primary: A ranked list of important effects (and interactions).
For full factorial designs, interactions include the full
complement of interactions at all orders; for fractional
factorial designs, interactions include only some, and
occasionally none, of the actual interactions.
 Secondary: Grouping of factors (and interactions) into two
categories: important and unimportant.

Definition

The effects plot is formed by:
The estimated effects that form the basis for the vertical axis are
optimal in the least squares sense. No other estimators exist that
will yield a smaller sum of squared deviations between the raw
data and the fitted values based on these estimates.
For both the 2^{k} full factorial designs and
2^{kp} fractional factorial designs, the form
for the least squares estimate of the factor i effect,
the 2factor interaction effect, and the multifactor interaction
effect has the following simple form:
factor i effect =
\( \bar{Y} \)(+)  \( \bar{Y} \)()
2factor interaction effect =
\( \bar{Y} \)(+)  \( \bar{Y} \)()
multifactor interaction effect =
\( \bar{Y} \)(+)  \( \bar{Y} \)()
with
\( \bar{Y} \) (+) denoting the average of all response values for which
factor i (or the 2factor or multifactor interaction) takes on a
"+" value, and
\( \bar{Y} \)() denoting the average of all response values for which
factor i (or the 2factor or multifactor interaction) takes on
a "" value.
The essence of the above simplification is that the 2level full and
fractional factorial designs are all orthogonal in nature, and so all
offdiagonal terms in the least squares X'X matrix vanish.

Motivation

Because of the differenceofmeans definition of the least squares
estimates, and because of the fact that all factors (and interactions)
are standardized by taking on values of 1 and +1 (simplified to
 and +), the resulting estimates are all on the same scale.
Therefore, comparing and ranking the estimates based on
magnitude makes eminently good sense.
Moreover, since the sign of each estimate is completely arbitrary and
will reverse depending on how the initial assignments were made
(e.g., we could assign "" to treatment A and "+" to treatment
B or just as easily assign "+" to treatment A and "" to
treatment B), forming a ranking based on magnitudes (as opposed to
signed effects) is preferred.
Given that, the ultimate and definitive ranking of factor and
interaction effects will be made based on the ranked (magnitude) list
of such least squares estimates. Such rankings are given graphically,
Paretostyle, within the plot; the rankings are given quantitatively
by the tableau in the upper right region of the plot. For the
case when we have fractional (versus full) factorial designs, the
upper right tableau also gives the confounding structure for whatever
design was used.
If a factor is important, the "+" average will be
considerably different from the "" average, and so the absolute value
of the difference will be large. Conversely, unimportant
factors have small differences in the averages, and so the absolute
value will be small.
We choose to form a Pareto chart of such effects. In the Pareto
chart, the largest effects (most important factors) will be
presented first (to the left) and then progress down to the smallest
effects (least important) factors to the right.

Plot for defective springs data

Applying the effects plot to the defective springs data yields the
following plot.

How to interpret

From the effects plot, we look for the following:
 The ranked list of factors (including interactions)
is given by the lefttoright order of the spikes.
These spikes should be of decreasing height as we move
from left to right. Note the factor identifier associated
with each of these bars.
 Identify the important factors. Forming the ranked list of
factors is important, but is only half of the analysis.
The second part of the analysis is to take the ranking and
"draw the (horizontal) line" in the list and on the graph so
that factors above the line are deemed "important while factors
below the line are deemed unimportant.
Since factor effects are frequently a continuum ranging from
the very large through the moderate and down to the very
small, the separation of all such factors into two groups
(important and unimportant) may seem arbitrary and severe.
However, in practice, from both a research funding and a
modeling point of view, such a bifurcation is both common and
necessary.
From an engineering researchfunding point of view, one must
frequently focus on a subset of factors for future research,
attention, and money, and thereby necessarily set aside other
factors from any further consideration. From a modelbuilding
point of view, a final model either has a term in it or it does
notthere is no middle ground. Parsimonious models require
inorout decisions. It goes without saying that as soon
as we have identified the important factors, these are the
factors that will comprise our (parsimonious) good model, and
those that are declared as unimportant will not be in the model.
Given that, where does such a bifurcation line go?
There are four ways, each discussed in turn, to draw such a
line:
 Statistical significance;
 Engineering significance;
 Numerical significance; and
 Pattern significance.
The ranked list and segregation of factors derived from the effects
plot are to be compared with the ranked list of factors obtained in
previous steps. Invariably, there will be a considerable degree of
consistency exhibited across all of the techniques.

Conclusions for the defective springs data

The application of the effects plot to the defective springs data
set results in the following conclusions:
 Ranked list of factors (including interactions):
 X_{1} (most important)
 X_{1}*X_{3} (next most important)
 X_{2}
 other factors are of lesser importance
 Separation of factors into important/unimportant categories:
 Important: X_{1}, X_{1}*X_{3}, and X_{2}
 Unimportant: the remainder
