CRAMER CONTINGENCY COEFICIENT

Name:

CRAMER CONTINGENCY COEFICIENT (LET) Type:

Let Subcommand Purpose:

Description:

A common question with regards to a two-way contingency table is whether we have independence. By independence, we mean that the row and column variables are unassociated (i.e., knowing the value of the row variable will not help us predict the value of column variable and likewise knowing the value of the column variable will not help us predict the value of the row variable).

A more technical definition for independence is that

The standard test statistic for determing independence is the chi-square test statistic:

\( T = \sum_{i=1}^{r}{\sum_{j=1}^{c}{\frac{O_{ij} - E_{ij}} {E_{ij}}}} \)

One criticism of this statistic is that it does not give a meaningful description of the degree of dependence (or strength of association). That is, it is useful for determining whether there is dependence. However, since the strength of that association also depends on the degrees of freedom as well as the value of the test statistic, it is not easy to interpert the strength of association.

The Cramer's contingency coefficient is one method to provide an easier to interpret measure of strength of association. Specifically, it is:

\( \mbox{Cramer's Coefficient} = \sqrt{\frac{T}{N(q -1}} \)

where

This statistic is based on the fact that the maximum value of T is:

So this statistic basically scales the chi-square statistic to a value between 0 (no association) and 1 (maximum association). It has the desirable property of scale invariance. That is, if the sample size increases, the value of Cramer's contingency coefficient does not change as long as values in the table change the same relative to each other.

The data for the contingency table can be specified in either of the following two ways:

raw data
In this case, you will have two variables. The first will contain r distinct values and the second will contain c distinct values. Dataplot will automatically perform the cross-tabulation to obtain the counts for each cell. Although the distinct values will typically be integers, this is not strictly required.
table data
If you only have the resulting contingency table (i.e., the counts for each cell), then you can use the READ MATRIX (or CREATE MATRIX) command to create a matrix with the data. This is demonstrated in the example program below.
In this case, your data should contain non-negative integers since they represent the counts for each cell.

Syntax 1:

Use this syntax for raw data.

Syntax 2:

Use this syntax if your data is a contingency table.

Examples:

Note:

For the raw data case, the two variables should have the same number of elements. Note:

HELP STATISTICS

Note that these commands are only available if you have raw data.

Default:

None Synonyms:

None Related Commands:

PEARSON CONTINGENCY COEFFICIENT	= Compute Pearson's contingency coefficient.
CHI-SQUARE INDEPENDENCE TEST	= Perform a chi-square test for independence.
ODDS RATIO INDEPENDENCE TEST	= Perform a log(odds ratio) test for independence.
FISHER EXACT TEST	= Perform Fisher's exact test.
ASSOCIATION PLOT	= Generate an association plot.
SIEVE PLOT	= Generate a sieve plot.
ROSE PLOT	= Generate a Rose plot.
BINARY TABULATION PLOT	= Generate a binary tabulation plot.
ROC CURVE	= Generate a ROC curve.
ODDS RATIO	= Compute the bias corrected odds ratio.
LOG ODDS RATIO	= Compute the bias corrected log(odds ratio).

Reference:

Practical Nonparametric Statistics

Friendly (2000), "Visualizing Categorical Data", SAS Institute Inc., p. 61.

Applications:

Categorical Data Analysis Implementation Date:

2007/5 Program:

 
. Example from page 61 of Friendly
read matrix m
 5  29 14 16
15  54 14 10
20  84 17 94
68 119 26 7
end of data
.
let a = matrix cramer contingency coefficient m