SED navigation bar go to SED home page go to Dataplot home page go to NIST home page SED Home Page SED Staff SED Projects SED Products and Publications Search SED Pages

Dataplot Vol 1 Vol 2


CHI SQUARE TWO SAMPLE

Name:
    ... CHI SQUARE TWO SAMPLE TEST
Type:
    Analysis Command
Purpose:
    Perform a chi-square two sample test that two data samples come from the same distribution. Note that we are not specifying what that common distribution is.
Description:
    The chi-square two sample test is based on binned data. Note that the binning for both data sets should be the same. The basic idea behind the chi-square two sample test is that the observed number of points in each bin (this is scaled for unequal sample sized) should be similar if the two data samples come from common distributions. More formally, the chi-square two sample test statistic can be defined as follows.

    H0: The two samples come from a common distribution.
    Ha: The two samples do not come from a common distribution.
    Test Statistic: For the chi-square two sample test, the data is divided into k bins and the test statistic is defined as

      C = SUM[i=1 to k][(K1*R(i) - K2*S(i))**2/(R(i) + S(i))]

    where the summation is for bin 1 to k, Ri is the observed frequency for bin i for sample 1, and Si is the observed frequency for bin i for sample 2. K1 and K2 are scaling constants that are used to adjust for unequal sample sizes. Specifically,

      K1 = SQRT(SUM[i=1 to k][S(i)]/SUM[k=1 to k][R(i)])

      K2 = SQRT(SUM[i=1 to k][R(i)]/SUM[k=1 to k][S(i)])

    This test is sensitive to the choice of bins. Most reasonable choices should produce similar, but not identical, results.

    Significance Level: alpha
    Critical Region: The test statistic follows, approximately, a chi-square distribution with (k - c) degrees of freedom where k is the number of non-empty bins and c = 1 if the sample sizes are equal and c = 0 if they are not equal. Therefore, the hypothesis that the distribution is from the specified distribution is rejected if C > CHSPPF(1-alpha,k-c) where CHSPPF is the chi-square percent point function with k - c degrees of freedom and a significance level of alpha.

    Dataplot supports the chi-square two sample test for either binned or unbinned data.

    For unbinned data, Dataplot automatically generates binned data using the same rule as for histograms. That is, the class width is 0.3*s where s is the sample standard deviation. The upper and lower limits are the mean plus or minus 6 times the sample standard deviation (any zero frequency bins in the tails are omitted). Note that the binning computations are performed with the combined data set. As with the HISTOGRAM command, you can override these defaults using the CLASS WIDTH, CLASS UPPER, and CLASS LOWER commands.

    The quantile-quantile plot, bihistogram, and Tukey mean-difference plot are graphical alternatives.

Syntax 1:
    CHI SQUARE TWO SAMPLE TEST <y1> <y2> <SUBSET/EXCEPT/FOR/qualification>
    where <y1> is the first response variable;
              <y2> is the second response variable;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    This syntax is used for unbinned data.

Syntax 2:
    CHI SQUARE TWO SAMPLE TEST <y1> <y2> <x> <SUBSET/EXCEPT/FOR/qualification>
    where <y1> is the first response variable;
              <y2> is the second response variable;
              <x> is a variable containing the mid-points of the bins;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    This syntax is used for binned data. The and variables contain bin frequencies and contains the bin midpoints.

Examples:
    CHI-SQUARE TWO SAMPLE TEST Y1 Y2
    CHI-SQUARE TWO SAMPLE TEST Y1 Y2 X
    CHI-SQUARE TWO SAMPLE TEST Y1 Y2 X SUBSET Y2 > 0
Note:
    The CHI-SQUARE TWO SAMPLE TEST command automatically saves the following parameters.

      STATVAL - value of the chi-square two sample test statistic
      STATNU - degrees of freedom for the chi-square two sample test
      STATCDF - cdf value for the chi-square two sample test statistic
      CUTUPP90 - 90% critical value (alpha = 0.10) for the chi-square two sample test statistic
      CUTUPP95 - 95% critical value (alpha = 0.05) for the chi-square two sample test statistic
      CUTUPP99 - 99% critical value (alpha = 0.01) for the chi-square two sample test statistic
    These parameters can be used in subsequent analysis.
Default:
    None
Synonyms:
    The following forms of the command are accepted.

      CHI SQUARE TWO SAMPLE TEST
      CHISQUARE TWO SAMPLE TEST
      CHI SQUARE 2 SAMPLE TEST
      CHISQUARE 2 SAMPLE TEST
      TWO SAMPLE CHI SQUARE TEST
      TWO SAMPLE CHISQUARE TEST
      2 SAMPLE CHI SQUARE TEST
      2 SAMPLE CHISQUARE TEST

    The word TEST in the above commands is optional.

Related Commands: Reference:
    Press, Teukolsky, Vetterlling, and Flannery, 1992, "Numerical Recipes in Fortan: The Art of Scientific Computing," Second Edition, Cambridge University Press, pp. 614-622.
Applications:
    Distributional Analysis
Implementation Date:
    1998/12
Program:
     
    SKIP 25
    READ AUTO83B.DAT Y1 Y2
    DELETE Y2 SUBSET Y2 < 0
    .
    SET WRITE DECIMALS 5
    CHI SQUARE TWO SAMPLE TEST Y1 Y2
    
    The following output is generated.
                 Chi-Square Two Sample Test
      
     First Response Variable:  Y1
     Second Response Variable: Y2
      
     H0: The Two Samples Come From the
         Same (Unspecified) Distribution
     Ha: The Two Samples Come From
         Different Distributions
      
     Sample One Summary Statistics:
     Number of Observations:                             249
     Sample Mean:                                   20.14457
     Sample Standard Deviation:                      6.41469
     Sample Minimum:                                 9.00000
     Sample Maximum:                                39.00000
      
     Sample Two Summary Statistics:
     Number of Observations:                              79
     Sample Mean:                                   30.48101
     Sample Standard Deviation:                      6.10771
     Sample Minimum:                                18.00000
     Sample Maximum:                                47.00000
      
     Number of Non-Empty Cells:                           18
     Class Width For Bins:                           1.83231
     Lower Class Limit:                            -38.48819
     Upper Class Limit:                             58.63277
      
     Chi-Squared Test Statistic:                    35.65477
     Degrees of Freedom:                                  18
     CDF of Test Statistic:                          0.99219
     P-Value:                                        0.00780
      
      
                 Conclusions (Upper 1-Tailed Test)
      
     ---------------------------------------------------------------------------
                                                  Null Hypothesis           Null
             Null     Confidence       Critical        Acceptance     Hypothesis
       Hypothesis          Level      Value (>)          Interval     Conclusion
     ---------------------------------------------------------------------------
             Same          50.0%          17.33         (0,0.500)         REJECT
             Same          80.0%          22.75         (0,0.800)         REJECT
             Same          90.0%          25.98         (0,0.900)         REJECT
             Same          95.0%          28.86         (0,0.950)         REJECT
             Same          97.5%          31.52         (0,0.975)         REJECT
             Same          99.0%          34.80         (0,0.990)         REJECT
             Same          99.9%          42.31         (0,0.999)         ACCEPT
        

Privacy Policy/Security Notice
Disclaimer | FOIA

NIST is an agency of the U.S. Commerce Department.

Date created: 06/05/2001
Last updated: 12/15/2013

Please email comments on this WWW page to alan.heckert@nist.gov.