SED navigation bar go to SED home page go to Dataplot home page go to NIST home page SED Home Page SED Staff SED Projects SED Products and Publications Search SED Pages
Dataplot Vol 1 Vol 2

ODDS RATIO CHI-SQUARE TEST

Name:
    ODDS RATIO CHI-SQUARE TEST (LET)
Type:
    Analysis Command
Purpose:
    Perform an odds ratio chi-square test of a series of fourfold (2x2) tables.
Description:
    Given two variables where each variable has exactly two possible outcomes (typically defined as success and failure), we define the odds ratio as:

      o = (N11/N12)/ (N21/N22)
          = (N11N22)/ (N12N21)

    where

      N11 = number of successes in sample 1
      N21 = number of failures in sample 1
      N12 = number of successes in sample 2
      N22 = number of failures in sample 2

    The first definition shows the meaning of the odds ratio clearly, although it is more commonly given in the literature with the second definition.

    The log odds ratio is the logarithm of the odds ratio:

      l(o) = LOG{(N11/N12)/ (N21/N22)}
             = LOG{(N11N22)/ (N12N21)}

    Alternatively, the log odds ratio can be given in terms of the proportions

      l(o) = LOG{(p11/p12)/ (p21/p22)}
             = LOG{(p11p22)/ (p12p21)}

    where

      p11 = N11/ (N11 + N21)
            = proportion of successes in sample 1
      p21 = N21/ (N11 + N21)
            = proportion of failures in sample 1
      p12 = N12/ (N12 + N22)
            = proportion of successes in sample 2
      p22 = N22/ (N12 + N22)
            = proportion of failures in sample 2

    Success and failure can denote any binary response. Dataplot expects "success" to be coded as "1" and "failure" to be coded as "0".

    The bias corrected version of the statistic is:

      l'(o) = LOG[{(N11+0.5) (N22+0.5)}/ {(N12+0.5) (N21+0.5)}]

    In addition to reducing bias, this statistic also has the advantage that the odds ratio is still defined even when N12 or N21 is zero (the uncorrected statistic will be undefined for these cases).

    Note that N11, N21, N12, and N22 defines a 2x2 contingency table. These types of contingency tables are also referred to as fourfold tables.

    The odds ratio chi-square test is applied in the situation where we have a series of fourfold tables. That is, the two variables for the fourfold tables are the same, but data is collected from different populations or groups with regards to these variables. Fleiss, Levin, and Paik (p. 234) list the following questions that are typically asked about these type of data:

    1. Is there evidence that the degree of association, whatever its magnitude, is consistent from one group to another?

    2. Assuming that the degree of association is found to be consistent, is the common degree of association statistically significant?

    3. Assuming that the common degree of association is significant, what is the best estimate of the common value for the measure of association? What is its standard error? How does one construct a confidence interval for the common measure?

    The following description for this test is summarized from Chapter 10 of Fleiss, Levin, and Paik. Consult this reference for a more detailed discussion.

    Suppose we have g fourfold tables. Then

      yi = measure of association for table i
      syi = standard error of yi
      wi = \( 1/s_{y_{i}}^{2} \)
      g = number of groups (i.e., number of 2x2 tables)

    This test is based on decomposing the total chi-square in the following way:

      \( \begin{array}{lcl} \chi_{\mbox{total}}^{2} & = & \sum_{i=1}^{g}{w_{i} y_{i}^{2}} \\ & = & \chi_{\mbox{homogeneity}}^{2} + \chi_{\mbox{association}}^{2} \end{array} \)

    The \( \chi_{\mbox{homogeneity}}^{2} \) assesses the degree of homogeneity (i.e., equality) among the g measures of association. The \( \chi_{\mbox{association}}^{2} \) assesses the significance of the average degree of association.

    The overall measure of association (across all groups) is the weighted average of the g individual measures:

      \( \bar{y} = \frac{\sum_{i=1}^{g}{w_{i} y_{i}}} {\sum_{i=1}^{g}{w_{i}}} \)

    Under the hypothesis of zero overall association, \( \bar{Y} \) has an average value of zero and a standard error of

      \( \bar{y} = \frac{\sum_{i=1}^{g}{w_{i} y_{i}}} {\sum_{i=1}^{g}{w_{i}}} \)

    From this

      \( \frac{\bar{y}} {s_{\bar{y}}} = \frac{\sum_{i=1}^{g}{w_{i} y_{i}}} {\sqrt{\sum_{i=1}^{g}{w_{i}}}} \)

    follows an approximately a standard normal distribution under the null hypothesis and

      \( \begin{array}{lcl} \chi_{\mbox{association}}^{2} & = & \bar{y}^{2} \sum_{i=1}^{g}{w_{i}} \\ & = & \frac{\left( \sum_{i=1}^{g}{w_{i} y_{i}} \right)^2} {\sum_{i=1}^{g}{w_{i}}} \end{array} \)

    follows an approximately chi-square distribution with one degree of freedom.

    In addition,

      \( \begin{array}{lcl} \chi_{\mbox{homogeneity}}^{2} & = & \chi_{\mbox{total}}^{2} - \chi_{\mbox{association}}^{2} \\ & = & \sum_{i=1}^{g}{w_{i} y_{i}^2} - \bar{y}^{2} \sum_{i=1}^{g}{w_{i}} \\ & = & \sum_{i=1}^{g}{w_{i} (y_{i} - \bar{y})^2} \end{array} \)

    follows an approximately chi-square distribution with g - 1 degrees of freedom.

    Note that \( \chi_{\mbox{association}}^{2} \) and \( \chi_{\mbox{homogeneity}}^{2} \) are uncorrelated.

    Based on the above formulas, we can answer the above questions as follows.

    1. Consistency of association can be tested using the \( \chi_{\mbox{homogeneity}}^{2} \) statistic. If this statistic is significant, this indicates that groups are different with respect to the measure of association.

    2. If \( \chi_{\mbox{homogeneity}}^{2} \) is not signficant (i.e., the groups can be considered equivalent), then the overall degree of association can be tested using the \( \chi_{\mbox{association}}^{2} \) statistic.

    3. The estimate of overall association is ybar and a large sample confidence interval is

        \( \bar{y} \pm \Phi^{-1}(\alpha/2) s_{\bar{y}} \)

    The above discussion is based on a generic statistic for the measure of association. For the odds ratio chi-square test, the specific measure of association is the bias corrected log odds ratio (given above). Note that the standard error of the bias corrected log odds ratio is:

      \( s_{l'(o)} = \sqrt{\frac{1}{N_{11}+0.5} + \frac{1}{N_{21}+0.5} + \frac{1}{N_{12}+0.5} + \frac{1}{N_{22}+0.5}} \)

    The ODDS RATIO CHI-SQUARE TEST generates the following output:

    1. A summary table of various statistics (odds ratio, log(odds ratio), standard error of log(odds ratio), wi and wi*log(odds ratio)).

    2. A table summarizing the combined log(odds ratio) and its standard error and the chi-square test statistics (total, association, and homogeneity).

    3. A table for the chi-square test for homogeneity.

    4. A table for the chi-square test for overall degree of association.

    5. Estimates and large sample confidence intervals for the common log(odds ratio) and the common odds ratio.
Syntax 1:
    ODDS RATIO CHI-SQUARE TEST <y1> <y2>
                            <SUBSET/EXCEPT/FOR qualification>
    where <y1> is the first response variable;
                <y2> is the second response variable;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    This syntax is used for the case where <y1> and <y2> denote a series of 2x2 tables (i.e., rows 1 and 2 are group 1, rows 3 and 4 are group 2, and so on).

Syntax 2:
    ODDS RATIO CHI-SQUARE TEST <y1> <y2> <groupid>
                            <SUBSET/EXCEPT/FOR qualification>
    where <y1> is the first response variable;
                <y2> is the second response variable;
                <groupid> is a group id variable;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    This syntax is used for the case where you have raw data (i.e., the data has not yet been cross tabulated into a two-way table). In this case, the two response variables have an equal number of cases for each group.

Syntax 3:
    ODDS RATIO CHI-SQUARE TEST <y1> <groupid1> <y2> <groupid2>
                            <SUBSET/EXCEPT/FOR qualification>
    where <y1> is the first response variable;
                <groupid1> is a group id variable corresponding to <y1>;
                <y2> is the second response variable;
                <groupid2> is a group id variable corresponding to <y2>;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    This syntax is used for the case where you have raw data (i.e., the data has not yet been cross tabulated into a two-way table). In this case, the two response variables may have an unequal number of cases for each group, so <y1> and <y2> require different group id variables.

Examples:
    ODDS RATIO CHI-SQUARE TEST Y1 Y2
    ODDS RATIO CHI-SQUARE TEST Y1 Y2 X
    ODDS RATIO CHI-SQUARE TEST Y1 X1 Y2 X2
Note:
    This test is similar to the Mantel-Haenszel test. Fleiss, Levin, and Paik make the following recommendations in regard to these two tests (they include other tests in their comparison).

    1. If the number of groups is small or moderate and the sample sizes within each group are large, the log(odds ratio) test performs well.

    2. If the number of groups is large, but the sample sizes within the groups are small to moderate, then the Mantel-Haenszel test can be recommended. The log(odds ratio) test may perform poorly for this case.

    3. If the number of groups and the sample sizes within the groups are both small, exact methods may be required. Dataplot does not currently support any exact methods for this problem.
Note:
    The following information is written to the file dpst1f.dat (in the current directory):

      Column 1 = significance level
      Column 2 = lower confidence limit for common log(odds ratio)
      Column 3 = upper confidence limit for common log(odds ratio)
      Column 4 = lower confidence limit for common odds ratio
      Column 5 = upper confidence limit for common odds ratio

    To read this information into Dataplot, enter

      SET READ FORMAT F10.5,1X,4E15.7
      READ DPST1F.DAT SIGLEV LOGLOWCL LOGUPPCL ODDLOWCL ODDUPPCL

    Dataplot saves the following internal parameters:

      STATTOT = the "total" test statistic
      CDFTOTAL = the cdf for the "total" test statistic
      STATASSO = the "association" test statistic
      CDFASSOC = the cdf for the "association" test statistic
      STATHOMO = the "homogeneity" test statistic
      CDFHOMOG = the cdf for the "homogeneity" test statistic
Default:
    None
Synonyms:
    None
Related Commands: Reference:
    Fleiss, Levin, and Paik (2003), Statistical Methods for Rates and Proportions, Third Edition, pp. 234-238.
Applications:
    Categorical Data Analysis
Implementation Date:
    2007/5
Program:
     
    let n1 = 105
    let n2 = 192
    let n3 = 145
    let n = n1 + n2 + n3
    let x = 3 for i = 1 1 n
    let istop = n1 + n2
    let x = 2 for i = 1 1 istop
    let x = 1 for i = 1 1 n1
    .
    set statistic missing value -99
    .
    .  Group 1 values
    .
    let y1 = 0 for i = 1 1 n
    let y2 = 0 for i = 1 1 n
    let y1 = 1 for i = 1 1  81
    let y2 = 1 for i = 1 1  34
    .
    .  Group 2 values (have unequal samples here, so fill
    .          with missing values
    .
    let istrt = n1 + 1
    let istop1 = istrt + 118 - 1
    let istop2 = istrt + 69 - 1
    let y1 = 1 for i = istrt 1 istop1
    let y2 = 1 for i = istrt 1 istop2
    let istrt2 = n1 + 174 + 1
    let istop2 = n1 + n2
    let y2 = -99 for i = istrt2 1 istop2
    .
    .  Group 3 values
    .
    let istrt = n1 + n2 + 1
    let istop1 = istrt + 82 - 1
    let istop2 = istrt + 52 - 1
    let y1 = 1 for i = istrt 1 istop1
    let y2 = 1 for i = istrt 1 istop2
    .
    odds ratio chi-square test y1 y2 x
        
    The following output is generated.
                       SUMMARY OF LOG(ODDS RATIO)
      
           |                    LOG OF        STANDARD
           |   ODDS RATIO     ODDS RATIO        ERROR    1/SE(L(i))**2        w(i)*
     GROUP |      O(i)          L(i)          SE(L(i))        w(i)          L(i)**2
     ===============================================================================
        1. |    6.894114       1.930668      0.3099319       10.41040       38.80455
        2. |    2.414514      0.8814980      0.2138429       21.86806       16.99233
        3. |    2.313836      0.8389067      0.2400251       17.35748       12.21558
     ===============================================================================
     TOTAL |                                                 49.63593       68.01245
      
      
            CHI-SQUARE ANALYSIS OF LOG(ODDS RATIO)
      
     NUMBER OF GROUPS                            =        3
     ESTIMATE OF COMBINED LOG(ODDS RATIO)        =    1.086652
     STANDARD ERROR OF COMBINED LOG(ODDS RATIO)  =   0.1419390
      
     CHI-SQUARE TEST STATISTIC (TOTAL)           =    68.01245
     DEGRESS OF FREEDOM                          =        3
     CDF OF TEST STATISTIC                       =    1.000000
      
     CHI-SQUARE TEST STATISTIC (ASSOCIATION)     =    58.61073
     DEGRESS OF FREEDOM                          =        1
     CDF OF TEST STATISTIC                       =    1.000000
      
     CHI-SQUARE TEST STATISTIC (HOMOGENEITY)     =    9.401718
     DEGRESS OF FREEDOM                          =        2
     CDF OF TEST STATISTIC                       =   0.9978321
      
      
        CHI-SQUARE TEST FOR CONSISTENCY OF ASSOCIATION (HOMOGENEITY)
                                           NULL HYPOTHESIS   NULL
     NULL          CONFIDENCE    CRITICAL  ACCEPTANCE        HYPOTHESIS
     HYPOTHESIS    LEVEL         VALUE     INTERVAL          CONCLUSION
     ===================================================================
     CONSISTENT       50.0%        1.39     (0,0.500)        REJECT
     CONSISTENT       80.0%        3.22     (0,0.800)        REJECT
     CONSISTENT       90.0%        4.61     (0,0.900)        REJECT
     CONSISTENT       95.0%        5.99     (0,0.950)        REJECT
     CONSISTENT       97.5%        7.38     (0,0.975)        REJECT
     CONSISTENT       99.0%        9.21     (0,0.990)        REJECT
      
      
        CHI-SQUARE TEST FOR OVERALL DEGREE OF ASSOCIATION
                                           NULL HYPOTHESIS   NULL
     NULL          CONFIDENCE    CRITICAL  ACCEPTANCE        HYPOTHESIS
     HYPOTHESIS    LEVEL         VALUE     INTERVAL          CONCLUSION
     ===================================================================
     NO ASSOCIATION   50.0%        0.45     (0,0.500)        REJECT
     NO ASSOCIATION   80.0%        1.64     (0,0.800)        REJECT
     NO ASSOCIATION   90.0%        2.71     (0,0.900)        REJECT
     NO ASSOCIATION   95.0%        3.84     (0,0.950)        REJECT
     NO ASSOCIATION   97.5%        5.02     (0,0.975)        REJECT
     NO ASSOCIATION   99.0%        6.63     (0,0.990)        REJECT
      
      
     LARGE SAMPLE CONFIDENCE INTERVAL FOR LOG(ODDS RATIO)
                               LOG(ODDS RATIO)                  ODDS RATIO
                              (   1.086652    )           (   2.964333    )
        CONFIDENCE           LOWER         UPPER         LOWER         UPPER
        VALUE (%)            LIMIT         LIMIT         LIMIT         LIMIT
     -----------------------------------------------------------------------
          50.000          0.990915       1.18239       2.69370       3.26216
          80.000          0.904750       1.26855       2.47131       3.55571
          90.000          0.853183       1.32012       2.34711       3.74387
          95.000          0.808457       1.36485       2.24444       3.91513
          97.500          0.768509       1.40479       2.15655       4.07469
          99.000          0.721041       1.45226       2.05657       4.27277
        
Date created: 10/10/2008
Last updated: 12/11/2023

Please email comments on this WWW page to alan.heckert@nist.gov.