5. Process Improvement

## An EDA approach to experimental design

Introduction This section presents an exploratory data analysis (EDA) approach to analyzing the data from a designed experiment. This material is meant to complement, not replace, the more model-based approach for analyzing experiment designs given in section 4 of this chapter.

Choosing an appropriate design is discussed in detail in section 3 of this chapter.

Starting point
Problem category The problem category we will address is the screening problem. Two characteristics of screening problems are:
1. There are many factors to consider.
2. Each of these factors may be either continuous or discrete.
Desired output The desired output from the analysis of a screening problem is:
• A ranked list (by order of importance) of factors.
• The best settings for each of the factors.
• A good model.
• Insight.
Problem essentials The essentials of the screening problem are:
• There are k factors with n observations.
• The generic model is:

Y = f(X1, X2, ..., Xk) + ε

Design type In particular, the EDA approach is applied to 2k full factorial and 2k-p fractional factorial designs.

An EDA approach is particularly applicable to screening designs because we are in the preliminary stages of understanding our process.

EDA philosophy EDA is not a single technique. It is an approach to analyzing data.
• EDA is data-driven. That is, we do not assume an initial model. Rather, we attempt to let the data speak for themselves.

• EDA is question-based. That is, we select a technique to answer one or more questions.

• EDA utilizes multiple techniques rather than depending on a single technique. Different plots have a different basis, focus, and sensitivities, and therefore may bring out different aspects of the data. When multiple techniques give us a redundancy of conclusions, this increases our confidence that our conclusions are valid. When they give conflicting conclusions, this may be giving us a clue as to the nature of our data.

• EDA tools are often graphical. The primary objective is to provide insight into the data, which graphical techniques often provide more readily than quantitative techniques.
10-Step process The following is a 10-step EDA process for analyzing the data from 2k full factorial and 2k-p fractional factorial designs. Each of these plots will be presented with the following format:
• Purpose of the plot
• Output of the plot
• Definition of the plot
• Motivation for the plot
• An example of the plot using the defective springs data
• A discussion of how to interpret the plot
• Conclusions we can draw from the plot for the defective springs data
Data set
Defective springs data The plots presented in this section are demonstrated with a data set from Box and Bisgaard (1987).

These data are from a 23 full factorial data set that contains the following variables:

1. Response variable Y = percentage of springs without cracks
2. Factor 1 = oven temperature (2 levels: 1450 and 1600 F)
3. Factor 2 = carbon concentration (2 levels: 0.5% and 0.7%)
4. Factor 3 = quench temperature (2 levels: 70 and 120 F)
```     Y         X1              X2            X3
Percent     Oven           Carbon        Quench
Acceptable  Temperature  Concentration   Temperature
----------------------------------------------------
67         -1              -1            -1
79         +1              -1            -1
61         -1              +1            -1
75         +1              +1            -1
59         -1              -1            +1
90         +1              -1            +1
52         -1              +1            +1
87         +1              +1            +1
```