CHI-SQUARE INDEPENDENCE TEST

Name:

CHI-SQUARE INDEPENDENCE TEST (LET) Type:

Analysis Command Purpose:

Perform a chi-square test of independence for a two-way contingency table. Description:

A common question with regards to a two-way contingency table is whether we have independence. By independence, we mean that the row and column variables are unassociated (i.e., knowing the value of the row variable will not help us predict the value of column variable and likewise knowing the value of the column variable will not help us predict the value of the row variable).

A more technical definition for independence is that

One such test is the chi-square test for independence.

H₀:

The two-way table is independent

H_a:

The two-way table is not independent

Test Statistic:

\( T = \sum_{i=1}^{r}{\sum_{j=1}^{c}{\frac{O_{ij} - E_{ij}} {E_{ij}}}} \)

where

r	=	the number of rows in the contingency table
c	=	the number of columns in the contingency table
O_ij	=	the observed frequency of the ith row and jth column
E_ij	=	the expected frequency of the ith row and jth column
	=	\( \frac{R_i C_j}{N} \)
R_i	=	the sum of the observed frequencies for row i
C_j	=	the sum of the observed frequencies for column j
N	=	the total sample size

Significance Level:

\( \alpha \)

Critical Region:

T > CHSPPF(\( \alpha \),(r-1)*(c-1))

where CHSPPF is the percent point function of the chi-square distribution and (r-1)*(c-1) is the degrees of freedom

Conclusion:

Reject the independence hypothesis if the value of the test statistic is greater than the chi-square value.

This test statistic can also be formulated as

\( \sum_{i=1}^{r}{\sum_{j=1}^{c}{d_{ij}^2}} \)

where

\( d_{ij}^2 = \frac{O_{ij} - E_{ij}} {\sqrt{E_{ij}}} \)

The d_ij are referred to as the standardized residuals and they show the contribution to the chi-square test statistic of each cell.

Syntax 1:

This syntax is used for the case where you have raw data (i.e., the data has not yet been cross tabulated into a two-way table).

Syntax 2:

This syntax is used for the case where we the data have already been cross-tabulated into a two-way contingency table.

Syntax 3:

This syntax is used for the special case where you have a 2x2 table. In this case, you can enter the 4 values directly, although you do need to be careful that the parameters are entered in the order expected above.

Examples:

Note:

Cochran suggests that if the minimum expected frequency is less than 1 or if 20% of the expected frequencies are less than 5, the approximation may be poor. However, Conover suggests that this is probably too conservative, particularly if r and c are not too small. He suggests that the minimum expected frequency should be 0.5 and at least half the expected frequencies should be greater than 1.

In any event, if there are too many low expected frequencies, you can do one of the following:

If rows or columns with small expected frequencies can be intelligently combined, then this may result in expected frequencies that are sufficiently large.
Use Fisher's exact test.

Note:

Only N is fixed. The row and column totals are not fixed (i.e., they are random).
Either the row totals or the column totals are fixed before hand.
Both the row totals and the column totals are fixed before hand.

Note that in all three cases, the test statistic and the chi-square approximation are the same. What differs is the exact distribution of the test statistic. When either the row or column totals (or both) are fixed, the possible number of contingency tables is reduced.

As long as the expected frequencies are sufficiently large, the chi-square approximation should be adequate for practical purposes.

Note:

Some authors recommend using a continuity correction for this test. In this case, 0.5 is added to the observed frequency in each cell. Dataplot performs this test both with the continuity correction and without the continuity correction. Note:

E_ij

O_ij

To read this information into Dataplot, enter

Note:

The ODDS RATIO INDEPDNENCE TEST is an alternative test for independence based on the LOG(odds ratio).

Default:

None Synonyms:

None Related Commands:

ODDS RATIO INDEPENDENCE TEST	= Perform a log(odds ratio) test for independence.
FISHER EXACT TEST	= Perform Fisher's exact test.
ASSOCIATION PLOT	= Generate an association plot.
SIEVE PLOT	= Generate a sieve plot.
ROSE PLOT	= Generate a Rose plot.
BINARY TABULATION PLOT	= Generate a binary tabulation plot.
ROC CURVE	= Generate a ROC curve.
ODDS RATIO	= Compute the bias corrected odds ratio.
LOG ODDS RATIO	= Compute the bias corrected log(odds ratio).

Reference:

Friendly (2000), "Visualizing Categorical Data", SAS Institute Inc., p. 90.

Cochran (1952), "The Chi-Square Test of Goodness of Fit", Annals of Mathematical Statistics, 23, pp. 315-345.

Applications:

Categorical Data Analysis Implementation Date:

2007/3 Program:

 
. Example from page 61 of Friendly
read matrix m
 5  29 14 16
15  54 14 10
20  84 17 94
68 119 26 7
end of data
.
chi-square independence test m

           CHI-SQUARE TEST FOR INDEPENDENCE (RXC TABLE)
  
 NULL HYPOTHESIS: THE TWO VARIABLES ARE INDEPENDENT
 ALTERNATIVE HYPOTHESIS: THE TWO VARIABLES ARE NOT INDEPENDENT
  
 SAMPLE 1:
 NUMBER OF OBSERVATIONS                    =      592
 NUMBER OF LEVELS (ROWS)                   =        4
  
 SAMPLE 2:
 NUMBER OF OBSERVATIONS                    =      592
 NUMBER OF LEVELS (COLUMNS)                =        4
  
 WITHOUT YATES CONTINUITY CORRECTION:
 CHI-SQUARE TEST STATISTIC                =    138.2898
 DEGREES OF FREEDOM                       =        9
 CDF VALUE OF TEST STATISTIC              =    1.000000
  
 WITH YATES CONTINUITY CORRECTION:
 CHI-SQUARE TEST STATISTIC                =    132.0374
 DEGREES OF FREEDOM                       =        9
 CDF VALUE OF TEST STATISTIC              =    1.000000
  
  
 WITHOUT YATES CONTINUITY CORRECTION
                                       NULL HYPOTHESIS   NULL
 NULL          CONFIDENCE    CRITICAL  ACCEPTANCE        HYPOTHESIS
 HYPOTHESIS    LEVEL         VALUE     INTERVAL          CONCLUSION
 ===================================================================
 INDEPENDENT      50.0%        8.34     (0,0.500)        REJECT
 INDEPENDENT      80.0%       12.24     (0,0.800)        REJECT
 INDEPENDENT      90.0%       14.68     (0,0.900)        REJECT
 INDEPENDENT      95.0%       16.92     (0,0.950)        REJECT
 INDEPENDENT      97.5%       19.02     (0,0.975)        REJECT
 INDEPENDENT      99.0%       21.67     (0,0.990)        REJECT
  
 WITH YATES CONTINUITY CORRECTION
                                       NULL HYPOTHESIS   NULL
 NULL          CONFIDENCE    CRITICAL  ACCEPTANCE        HYPOTHESIS
 HYPOTHESIS    LEVEL         VALUE     INTERVAL          CONCLUSION
 ===================================================================
 INDEPENDENT      50.0%        8.34     (0,0.500)        REJECT
 INDEPENDENT      80.0%       12.24     (0,0.800)        REJECT
 INDEPENDENT      90.0%       14.68     (0,0.900)        REJECT
 INDEPENDENT      95.0%       16.92     (0,0.950)        REJECT
 INDEPENDENT      97.5%       19.02     (0,0.975)        REJECT
 INDEPENDENT      99.0%       21.67     (0,0.990)        REJECT