5.5. Advanced topics
5.5.9. An EDA approach to experimental design
The half-normal probability plot answers the question:
The half-normal probability plot is a graphical tool that uses these ordered estimated effects to help assess which factors are important and which are unimportant.
A half-normal distribution is the distribution of the |X| with X having a normal distribution.
The outputs from the half-normal probablity plot are
A half-normal probability plot is formed by
To provide a rationale for the half-normal probability plot,
we first dicuss the motivation for the normal probability
plot (which also finds frequent use in these 2-level designs).
The basis for the normal probability plot is the mathematical form for each (and all) of the estimated effects. As discussed for the |effects| plot, the estimated effects are the optimal least squares estimates. Because of the orthogonality of the 2k full factorial and the 2k-p fractional factorial designs, all least squares estimators for main effects and interactions simplify to the form:
Under rather general conditions, the Central Limit Thereom allows that the difference-of-sums form for the estimated effects tends to follow a normal distribution (for a large enough sample size n) a normal distribution.
The question arises as to what normal distribution; that is, a normal distribution with what mean and what standard deviation? Since all estimators have an identical form (a difference of averages), the standard deviations, though unknown, will in fact be the same under the assumption of constant σ. This is good in that it simplifies the normality analysis.
As for the means, however, there will be differences from one effect to the next, and these differences depend on whether a factor is unimportant or important. Unimportant factors are those that have near-zero effects and important factors are those whose effects are considerably removed from zero. Thus, unimportant effects tend to have a normal distribution centered near zero while important effects tend to have a normal distribution centered at their respective true large (but unknown) effect values.
In the simplest experimental case, if the experiment were such that no factors were important (that is, all effects were near zero), the (n-1) estimated effects would behave like random drawings from a normal distribution centered at zero. We can test for such normality (and hence test for a null-effect experiment) by using the normal probability plot. Normal probability plots are easy to interpret. In simplest terms:
On the other hand, if the truth behind the experiment is that there is exactly one factor that was important (that is, significantly non-zero), and all remaining factors are unimportant (that is, near-zero), then the normal probability plot of all (n-1) effects is near-linear for the (n-2) unimportant factors and the remaining single important factor would stand well off the line.
Similarly, if the experiment were such that some subset of factors were important and all remaining factors were unimportant, then the normal probability plot of all (n-1) effects would be near-linear for all unimportant factors with the remaining important factors all well off the line.
In real life, with the number of important factors unknown, this suggests that one could form a normal probability plot of the (n-1) estimated effects and draw a line through those (unimportant) effects in the vicinity of zero. This identifies and extracts all remaining effects off the line and declares them as important.
The above rationale and methodology works well in practice, with the net effect that the normal probability plot of the effects is an important, commonly used and successfully employed tool for identifying important factors in 2-level full and factorial experiments. Following the lead of Cuthbert Daniel (1976), we augment the methodology and arrive at a further improvement. Specifically, the sign of each estimate is completely arbitrary and will reverse depending on how the initial assignments were made (e.g., we could assign "-" to treatment A and "+" to treatment B or just as easily assign "+" to treatment A and "-" to treatment B).
This arbitrariness is addressed by dealing with the effect magnitudes rather than the signed effects. If the signed effects follow a normal distribution, the absolute values of the effects follow a half-normal distribution.
In this new context, one tests for important versus unimportant factors by generating a half-normal probability plot of the absolute value of the effects. As before, linearity implies half-normality, which in turn implies all factors are unimportant. More typically, however, the half-normal probability plot will be only partially linear. Unimportant (that is, near-zero) effects manifest themselves as being near zero and on a line while important (that is, large) effects manifest themselves by being off the line and well-displaced from zero.
|Plot for defective springs data||
The half-normal probability plot of the effects for the defectice
springs data set is as follows.
|How to interpret||
From the half-normal probability plot, we look for the following:
|Conclusions for the defective springs data||
The application of the half-normal probability plot to the
defective springs data set results in the following conclusions: