SED navigation bar go to SED home page go to Dataplot home page go to NIST home page SED Home Page SED Contacts SED Projects SED Products and Publications Search SED Pages

Dataplot Vol 1 Auxiliary Chapter


CHI SQUARE TWO SAMPLE

Name:
    ... CHI SQUARE TWO SAMPLE TEST
Type:
    Analysis Command
Purpose:
    Perform a chi-square two sample test that two data samples come from the same distribution. Note that we are not specifying what that common distribution is.
Description:
    The chi-square two sample test is based on binned data. Note that the binning for both data sets should be the same. The basic idea behind the chi-square two sample test is that the observed number of points in each bin (this is scaled for unequal sample sized) should be similar if the two data samples come from common distributions. More formally, the chi-square two sample test statistic can be defined as follows.

    H0: The two samples come from a common distribution.
    Ha: The two samples do not come from a common distribution.
    Test Statistic: For the chi-square two sample test, the data is divided into k bins and the test statistic is defined as

      C = SUM[i=1 to k][(K1*R(i) - K2*S(i))**2/(R(i) + S(i))]

    where the summation is for bin 1 to k, Ri is the observed frequency for bin i for sample 1, and Si is the observed frequency for bin i for sample 2. K1 and K2 are scaling constants that are used to adjust for unequal sample sizes. Specifically,

      K1 = SQRT(SUM[i=1 to k][S(i)]/SUM[k=1 to k][R(i)])

      K2 = SQRT(SUM[i=1 to k][R(i)]/SUM[k=1 to k][S(i)])

    This test is sensitive to the choice of bins. Most reasonable choices should produce similar, but not identical, results.

    Significance Level: alpha
    Critical Region: The test statistic follows, approximately, a chi-square distribution with (k - c) degrees of freedom where k is the number of non-empty bins and c = 1 if the sample sizes are equal and c = 0 if they are not equal. Therefore, the hypothesis that the distribution is from the specified distribution is rejected if C > CHSPPF(1-alpha,k-c) where CHSPPF is the chi-square percent point function with k - c degrees of freedom and a significance level of alpha.

    Dataplot supports the chi-square two sample test for either binned or unbinned data.

    For unbinned data, Dataplot automatically generates binned data using the same rule as for histograms. That is, the class width is 0.3*s where s is the sample standard deviation. The upper and lower limits are the mean plus or minus 6 times the sample standard deviation (any zero frequency bins in the tails are omitted). Note that the binning computations are performed with the combined data set. As with the HISTOGRAM command, you can override these defaults using the CLASS WIDTH, CLASS UPPER, and CLASS LOWER commands.

    The quantile-quantile plot, bihistogram, and Tukey mean-difference plot are graphical alternatives.

Syntax 1:
    CHI SQUARE TWO SAMPLE TEST <y1> <y2> <SUBSET/EXCEPT/FOR/qualification>
    where <y1> is the first response variable;
              <y2> is the second response variable;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    This syntax is used for unbinned data.

Syntax 2:
    CHI SQUARE TWO SAMPLE TEST <y1> <y2> <x> <SUBSET/EXCEPT/FOR/qualification>
    where <y1> is the first response variable;
              <y2> is the second response variable;
              <x> is a variable containing the mid-points of the bins;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    This syntax is used for binned data. The and variables contain bin frequencies and contains the bin midpoints.

Examples:
    CHI-SQUARE TWO SAMPLE TEST Y1 Y2
    CHI-SQUARE TWO SAMPLE TEST Y1 Y2 X
    CHI-SQUARE TWO SAMPLE TEST Y1 Y2 X SUBSET Y2 > 0
Note:
    The CHI-SQUARE TWO SAMPLE TEST command automatically saves the following parameters.

      STATVAL - value of the chi-square two sample test statistic
      STATNU - degrees of freedom for the chi-square two sample test
      STATCDF - cdf value for the chi-square two sample test statistic
      CUTUPP90 - 90% critical value (alpha = 0.10) for the chi-square two sample test statistic
      CUTUPP95 - 95% critical value (alpha = 0.05) for the chi-square two sample test statistic
      CUTUPP99 - 99% critical value (alpha = 0.01) for the chi-square two sample test statistic
    These parameters can be used in subsequent analysis.
Default:
    None
Synonyms:
    The following forms of the command are accepted.

      CHI SQUARE TWO SAMPLE TEST
      CHISQUARE TWO SAMPLE TEST
      CHI SQUARE 2 SAMPLE TEST
      CHISQUARE 2 SAMPLE TEST
      TWO SAMPLE CHI SQUARE TEST
      TWO SAMPLE CHISQUARE TEST
      2 SAMPLE CHI SQUARE TEST
      2 SAMPLE CHISQUARE TEST

    The word TEST in the above commands is optional.

Related Commands:
    CHI SQUARE GOODNESS OF FIT TEST = Perform chi-square goodness of fit test.
    KOLMOGOROV SMIRNOV TWO SAMPLE TEST = Perform Kolmogorov-Smirnov two sample test.
    BIHISTOGRAM = Generates a bihistogram.
    QUANTILE-QUANTILE PLOT = Generates a quantile-quantile plot.
    TUKEY MEAN DIFFERENCE PLOT = Generates a Tukey mean difference plot.
Reference:
    "Numerical Recipes in Fortan: The Art of Scientific Computing", Second Edition, Press, Teukolsky, Vetterlling, and Flannery, Cambridge University Press, 1992, pp. 614-622.
Applications:
    Distributional Analysis
Implementation Date:
    1998/12
Program:
    SKIP 25
    READ AUTO83B.DAT Y1 Y2
    .
    DELETE Y2 SUBSET Y2 < 0
    CHI SQUARE TWO SAMPLE TEST Y1 Y2

    The following output is generated.

          ****************************************
          **  CHI-SQUARE TWO SAMPLE TEST Y1 Y2  **
          ****************************************
     
     
                      CHI-SQUARED TWO SAMPLE TEST
     
    NULL HYPOTHESIS H0:      TWO SAMPLES COME FROM THE SAME (UNSPECIFIED)
                             DISTRIBUTION
    ALTERNATE HYPOTHESIS HA: TWO SAMPLES COME FROM DIFFERENT DISTRIBUTIONS
     
    SAMPLE:
       NUMBER OF OBSERVATIONS FOR SAMPLE 1 =      249
       NUMBER OF OBSERVATIONS FOR SAMPLE 2 =       79
       NUMBER OF NON-EMPTY CELLS           =       17
       CLASS WIDTH FOR BINS                =   0.1832313E+01
       CLASS LOWER FOR BINS                =  -0.1834362E+02
       CLASS UPPER FOR BINS                =   0.5863277E+02
     
    TEST:
    CHI-SQUARED TEST STATISTIC     =    35.33751
       DEGREES OF FREEDOM          =       17
       CHI-SQUARED CDF VALUE       =    0.994384
     
       ALPHA LEVEL         CUTOFF              CONCLUSION
               10%       24.76903               REJECT H0
                5%       27.58711               REJECT H0
                1%       33.40867               REJECT H0
        

Date created: 6/5/2001
Last updated: 9/25/2006
Please email comments on this WWW page to alan.heckert@nist.gov.