Product and Process Comparisons
7.4. Comparisons based on data from more than two processes
|Contingency Table approach||
When items are classified according to two or more criteria, it is
often of interest to decide whether these criteria act independently
of one another.
For example, suppose we wish to classify defects found in wafers produced in a manufacturing plant, first according to the type of defect and, second, according to the production shift during which the wafers were produced. If the proportions of the various types of defects are constant from shift to shift, then classification by defects is independent of the classification by production shift. On the other hand, if the proportions of the various defects vary from shift to shift, then the classification by defects depends upon or is contingent upon the shift classification and the classifications are dependent.
In the process of investigating whether one method of classification is contingent upon another, it is customary to display the data by using a cross classification in an array consisting of r rows and c columns called a contingency table. A contingency table consists of r x c cells representing the r x c possible outcomes in the classification process. Let us construct an industrial case:
|Industrial example||A total of 309 wafer defects were recorded and the defects were classified as being one of four types, A, B, C, or D. At the same time each wafer was identified according to the production shift in which it was manufactured, 1, 2, or 3.|
|Contingency table classifying defects in wafers according to type and production shift||
These counts are presented in the following table.
(Note: the numbers in parentheses are the expected cell frequencies).
Let pA be the probability that a defect will be of
type A. Likewise, define pB, pC, and
pD as the probabilities of observing the other
three types of defects. These probabilities, which are called the
column probabilities, will satisfy the requirement
By the same token, let pi (i=1,
2, or 3) be the row probability that a defect will have
occurred during shift i, where
|Multiplicative Law of Probability||Then if the two classifications are independent of each other, a cell probability will equal the product of its respective row and column probabilities in accordance with the Multiplicative Law of Probability.|
|Example of obtaining column and row probabilities||
For example, the probability that a particular defect will occur in
shift 1 and is of type A is
(p1) (pA). While the numerical
values of the cell probabilities are unspecified, the null
hypothesis states that each cell probability will equal the product
of its respective row and column probabilities. This condition
implies independence of the two classifications. The alternative
hypothesis is that this equality does not hold for at least one
In other words, we state the null hypothesis as H0: the two classifications are independent, while the alternative hypothesis is Ha: the classifications are dependent.
To obtain the observed column probability, divide the column total by the grand total, n. Denoting the total of column j as cj, we get
|Expected cell frequencies||
Denote the observed frequency of the cell in row i and column
jof the contingency table by nij. Then
|Estimated expected cell frequency when H0 is true.||
In other words, when the row and column classifications are
independent, the estimated expected value of the observed cell
frequency nij in an r x c
contingency table is equal to its respective row and column totals
divided by the total frequency.
From here we use the expected and observed frequencies shown in the
table to calculate the value of the test statistic
|df = (r-1)(c-1)||
The next step is to find the appropriate number of degrees of
freedom associated with the test statistic. Leaving out the details
of the derivation, we state the result:
The number of degrees of freedom associated with a contingency table consisting of r rows and c columns is (r-1) (c-1).So for our example we have (3-1) (4-1) = 6 d.f.
|Testing the null hypothesis||In order to test the null hypothesis, we compare the test statistic with the critical value of Χ 21-α/2 at a selected value of α. Let us use α = 0.05. Then the critical value is Χ 20.95,6 = 12.5916 (see the chi square table in Chapter 1). Since the test statistic of 19.18 exceeds the critical value, we reject the null hypothesis and conclude that there is significant evidence that the proportions of the different defect types vary from shift to shift. In this case, the p-value of the test statistic is 0.00387.|