7.
Product and Process Comparisons
7.4. Comparisons based on data from more than two processes


Contingency Table approach 
When items are classified according to two or more criteria, it is
often of interest to decide whether these criteria act independently
of one another.
For example, suppose we wish to classify defects found in wafers produced in a manufacturing plant, first according to the type of defect and, second, according to the production shift during which the wafers were produced. If the proportions of the various types of defects are constant from shift to shift, then classification by defects is independent of the classification by production shift. On the other hand, if the proportions of the various defects vary from shift to shift, then the classification by defects depends upon or is contingent upon the shift classification and the classifications are dependent. In the process of investigating whether one method of classification is contingent upon another, it is customary to display the data by using a cross classification in an array consisting of r rows and c columns called a contingency table. A contingency table consists of r x c cells representing the r x c possible outcomes in the classification process. Let us construct an industrial case: 

Industrial example  A total of 309 wafer defects were recorded and the defects were classified as being one of four types, A, B, C, or D. At the same time each wafer was identified according to the production shift in which it was manufactured, 1, 2, or 3.  
Contingency table classifying defects in wafers according to type and production shift 
These counts are presented in the following table.
(Note: the numbers in parentheses are the expected cell frequencies). 

Column probabilities 
Let p_{A} be the probability that a defect will be of
type A. Likewise, define p_{B}, p_{C}, and
p_{D} as the probabilities of observing the other
three types of defects. These probabilities, which are called the
column probabilities, will satisfy the requirement


Row probabilities 
By the same token, let p_{i}_{ }(i=1,
2, or 3) be the row probability that a defect will have
occurred during shift i, where


Multiplicative Law of Probability  Then if the two classifications are independent of each other, a cell probability will equal the product of its respective row and column probabilities in accordance with the Multiplicative Law of Probability.  
Example of obtaining column and row probabilities 
For example, the probability that a particular defect will occur in
shift 1 and is of type A is
(p_{1}) (p_{A}). While the numerical
values of the cell probabilities are unspecified, the null
hypothesis states that each cell probability will equal the product
of its respective row and column probabilities. This condition
implies independence of the two classifications. The alternative
hypothesis is that this equality does not hold for at least one
cell.
In other words, we state the null hypothesis as H_{0}: the two classifications are independent, while the alternative hypothesis is H_{a}: the classifications are dependent. To obtain the observed column probability, divide the column total by the grand total, n. Denoting the total of column j as c_{j}, we get 

Expected cell frequencies 
Denote the observed frequency of the cell in row i and column
jof the contingency table by n_{ij}. Then
we have


Estimated expected cell frequency when H_{0} is true. 
In other words, when the row and column classifications are
independent, the estimated expected value of the observed cell
frequency n_{ij} in an r x c
contingency table is equal to its respective row and column totals
divided by the total frequency.


Test statistic 
From here we use the expected and observed frequencies shown in the
table to calculate the value of the test statistic


df = (r1)(c1) 
The next step is to find the appropriate number of degrees of
freedom associated with the test statistic. Leaving out the details
of the derivation, we state the result:
The number of degrees of freedom associated with a contingency table consisting of r rows and c columns is (r1) (c1).So for our example we have (31) (41) = 6 d.f. 

Testing the null hypothesis  In order to test the null hypothesis, we compare the test statistic with the critical value of Χ^{ 2}_{1α/2} at a selected value of α. Let us use α = 0.05. Then the critical value is Χ^{ 2}_{0.95,6} = 12.5916 (see the chi square table in Chapter 1). Since the test statistic of 19.18 exceeds the critical value, we reject the null hypothesis and conclude that there is significant evidence that the proportions of the different defect types vary from shift to shift. In this case, the pvalue of the test statistic is 0.00387. 