PEARSON CONTINGENCY COEFICIENT

Name:

PEARSON CONTINGENCY COEFICIENT (LET) Type:

Let Subcommand Purpose:

Description:

A common question with regards to a two-way contingency table is whether we have independence. By independence, we mean that the row and column variables are unassociated (i.e., knowing the value of the row variable will not help us predict the value of column variable and likewise knowing the value of the column variable will not help us predict the value of the row variable).

A more technical definition for independence is that

The standard test statistic for determing independence is the chi-square test statistic:

\( T = \sum_{i=1}^{r}{\sum_{j=1}^{c}{\frac{O_{ij} - E_{ij}} {E_{ij}}}} \)

One criticism of this statistic is that it does not give a meaningful description of the degree of dependence (or strength of association). That is, it is useful for determining whether there is dependence. However, since the strength of that association also depends on the degrees of freedom as well as the value of the test statistic, it is not easy to interpert the strength of association.

The Pearson's contingency coefficient is one method to provide an easier to interpret measure of strength of association. Specifically, it is:

\( \mbox{Pearson's Coefficient} = \sqrt{\frac{T}{N+T}} \)

where

So this statistic basically scales the chi-square statistic to a value between 0 (no association) and 1 (maximum association). It has the desirable property of scale invariance. That is, if the sample size increases, the value of Pearson's contingency coefficient does not change as long as values in the table change the same relative to each other.

The data for the contingency table can be specified in either of the following two ways:

raw data
In this case, you will have two variables. The first will contain r distinct values and the second will contain c distinct values. Dataplot will automatically perform the cross-tabulation to obtain the counts for each cell. Although the distinct values will typically be integers, this is not strictly required.
table data
If you only have the resulting contingency table (i.e., the counts for each cell), then you can use the READ MATRIX (or CREATE MATRIX) command to create a matrix with the data. This is demonstrated in the example program below.
In this case, your data should contain non-negative integers since they represent the counts for each cell.

Syntax 1:

Use this syntax for raw data.

Syntax 2:

Use this syntax if your data is a contingency table.

Examples:

Note:

The Cramer contingency coefficient is more commonly used than the Pearson contingency coefficient. Note:

PEARSON CONTINGENCY COEFICIENT PLOT Y1 Y2 X
CROSS TABULATE PEARSON CONTINGENCY COEFICIENT PLOT ...
Y1 Y2 X1 X2

BOOTSTRAP PEARSON CONTINGENCY COEFICIENT PLOT Y1 Y2
JACKNIFE PEARSON CONTINGENCY COEFICIENT PLOT Y1 Y2

The above commands expect the variables to have the same number of observations.

Note that the above commands are only available if you have raw data.

Default:

None Synonyms:

None Related Commands:

CRAMER CONTINGENCY COEFFICIENT	= Compute Cramer's contingency coefficient.
CHI-SQUARE INDEPENDENCE TEST	= Perform a chi-square test for independence.
ODDS RATIO INDEPENDENCE TEST	= Perform a log(odds ratio) test for independence.
FISHER EXACT TEST	= Perform Fisher's exact test.
ASSOCIATION PLOT	= Generate an association plot.
SIEVE PLOT	= Generate a sieve plot.
ROSE PLOT	= Generate a Rose plot.
BINARY TABULATION PLOT	= Generate a binary tabulation plot.
ROC CURVE	= Generate a ROC curve.
ODDS RATIO	= Compute the bias corrected odds ratio.
LOG ODDS RATIO	= Compute the bias corrected log(odds ratio).

Reference:

Practical Nonparametric Statistics

Friendly (2000), "Visualizing Categorical Data", SAS Institute Inc., p. 61.

Applications:

Categorical Data Analysis Implementation Date:

2007/5 Program:

 
. Sample data from page 61 of Friendly
read matrix m
 5  29 14 16
15  54 14 10
20  84 17 94
68 119 26 7
end of data
.
let a = matrix pearson contingency coefficient m