 Dataplot Vol 1 Vol 2

# CHI SQUARE TWO SAMPLE

Name:
... CHI SQUARE TWO SAMPLE TEST
Type:
Analysis Command
Purpose:
Perform a chi-square two sample test that two data samples come from the same distribution. Note that we are not specifying what that common distribution is.
Description:
The chi-square two sample test is based on binned data. Note that the binning for both data sets should be the same. The basic idea behind the chi-square two sample test is that the observed number of points in each bin (this is scaled for unequal sample sized) should be similar if the two data samples come from common distributions. More formally, the chi-square two sample test statistic can be defined as follows.

 H0: The two samples come from a common distribution. Ha: The two samples do not come from a common distribution. Test Statistic: For the chi-square two sample test, the data is divided into k bins and the test statistic is defined as $$\chi^{2} = \sum_{i=1}^{k}{\frac{(K_{1}R_{i} - K_{2}S_{i})^{2}} {R_{i} + S_{i}}}$$ where the summation is for bin 1 to k, Ri is the observed frequency for bin i for sample 1, and Si is the observed frequency for bin i for sample 2. K1 and K2 are scaling constants that are used to adjust for unequal sample sizes. Specifically, $$K_1 = \sqrt{\frac{\sum_{i=1}^{k}{S_i}} {\sum_{i=1}^{k} {R_i}}}$$ \end{document} $$K_2 = \sqrt{\frac{\sum_{i=1}^{k}{R_i}} {\sum_{i=1}^{k} {S_i}}}$$ This test is sensitive to the choice of bins. Most reasonable choices should produce similar, but not identical, results. Significance Level: $$\alpha$$ Critical Region: The test statistic follows, approximately, a chi-square distribution with (k - c) degrees of freedom where k is the number of non-empty bins and c = 1 if the sample sizes are equal and c = 0 if they are not equal. Therefore, the hypothesis that the distribution is from the specified distribution is rejected if C > CHSPPF(1-$$\alpha$$,k-c) where CHSPPF is the chi-square percent point function with k - c degrees of freedom and a significance level of $$\alpha$$.

Dataplot supports the chi-square two sample test for either binned or unbinned data.

For unbinned data, Dataplot automatically generates binned data using the same rule as for histograms. That is, the class width is 0.3*s where s is the sample standard deviation. The upper and lower limits are the mean plus or minus 6 times the sample standard deviation (any zero frequency bins in the tails are omitted). Note that the binning computations are performed with the combined data set. As with the HISTOGRAM command, you can override these defaults using the CLASS WIDTH, CLASS UPPER, and CLASS LOWER commands.

The quantile-quantile plot, bihistogram, and Tukey mean-difference plot are graphical alternatives.

Syntax 1:
CHI SQUARE TWO SAMPLE TEST <y1> <y2> <SUBSET/EXCEPT/FOR/qualification>
where <y1> is the first response variable;
<y2> is the second response variable;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

This syntax is used for unbinned data.

Syntax 2:
CHI SQUARE TWO SAMPLE TEST <y1> <y2> <x> <SUBSET/EXCEPT/FOR/qualification>
where <y1> is the first response variable;
<y2> is the second response variable;
<x> is a variable containing the mid-points of the bins;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

This syntax is used for binned data. The and variables contain bin frequencies and contains the bin midpoints.

Examples:
CHI-SQUARE TWO SAMPLE TEST Y1 Y2
CHI-SQUARE TWO SAMPLE TEST Y1 Y2 X
CHI-SQUARE TWO SAMPLE TEST Y1 Y2 X SUBSET Y2 > 0
Note:
The CHI-SQUARE TWO SAMPLE TEST command automatically saves the following parameters.

 STATVAL - value of the chi-square two sample test statistic STATNU - degrees of freedom for the chi-square two sample test STATCDF - cdf value for the chi-square two sample test statistic CUTUPP90 - 90% critical value (alpha = 0.10) for the chi-square two sample test statistic CUTUPP95 - 95% critical value (alpha = 0.05) for the chi-square two sample test statistic CUTUPP99 - 99% critical value (alpha = 0.01) for the chi-square two sample test statistic
These parameters can be used in subsequent analysis.
Note:
The following statistics are also supported:

LET A = TWO SAMPLE CHI SQUARE TEST Y1 Y2
LET A = TWO SAMPLE CHI SQUARE TEST CDF Y1 Y2
LET A = TWO SAMPLE CHI SQUARE TEST PVALUE Y1 Y2

In addition to the above LET command, built-in statistics are supported for about 20+ different commands (enter HELP STATISTICS for details).

Default:
None
Synonyms:
The following forms of the command are accepted.

CHI SQUARE TWO SAMPLE TEST
CHISQUARE TWO SAMPLE TEST
CHI SQUARE 2 SAMPLE TEST
CHISQUARE 2 SAMPLE TEST
TWO SAMPLE CHI SQUARE TEST
TWO SAMPLE CHISQUARE TEST
2 SAMPLE CHI SQUARE TEST
2 SAMPLE CHISQUARE TEST

The word TEST in the above commands is optional.

Related Commands:
 GOODNESS OF FIT TEST = Perform goodness of fit tests. KOLMOGOROV SMIRNOV TWO SAMPLE TEST = Perform Kolmogorov-Smirnov two sample test. BIHISTOGRAM = Generates a bihistogram. QUANTILE-QUANTILE PLOT = Generates a quantile-quantile plot. TUKEY MEAN DIFFERENCE PLOT = Generates a Tukey mean difference plot.
Reference:
Press, Teukolsky, Vetterlling, and Flannery, 1992, "Numerical Recipes in Fortan: The Art of Scientific Computing," Second Edition, Cambridge University Press, pp. 614-622.
Applications:
Distributional Analysis
Implementation Date:
1998/12
Program:

SKIP 25
DELETE Y2 SUBSET Y2 < 0
.
SET WRITE DECIMALS 5
CHI SQUARE TWO SAMPLE TEST Y1 Y2

The following output is generated.
             Chi-Square Two Sample Test

First Response Variable:  Y1
Second Response Variable: Y2

H0: The Two Samples Come From the
Same (Unspecified) Distribution
Ha: The Two Samples Come From
Different Distributions

Sample One Summary Statistics:
Number of Observations:                             249
Sample Mean:                                   20.14457
Sample Standard Deviation:                      6.41469
Sample Minimum:                                 9.00000
Sample Maximum:                                39.00000

Sample Two Summary Statistics:
Number of Observations:                              79
Sample Mean:                                   30.48101
Sample Standard Deviation:                      6.10771
Sample Minimum:                                18.00000
Sample Maximum:                                47.00000

Number of Non-Empty Cells:                           18
Class Width For Bins:                           1.83231
Lower Class Limit:                            -38.48819
Upper Class Limit:                             58.63277

Chi-Squared Test Statistic:                    35.65477
Degrees of Freedom:                                  18
CDF of Test Statistic:                          0.99219
P-Value:                                        0.00780

Conclusions (Upper 1-Tailed Test)

---------------------------------------------------------------------------
Null Hypothesis           Null
Null     Confidence       Critical        Acceptance     Hypothesis
Hypothesis          Level      Value (>)          Interval     Conclusion
---------------------------------------------------------------------------
Same          50.0%          17.33         (0,0.500)         REJECT
Same          80.0%          22.75         (0,0.800)         REJECT
Same          90.0%          25.98         (0,0.900)         REJECT
Same          95.0%          28.86         (0,0.950)         REJECT
Same          97.5%          31.52         (0,0.975)         REJECT
Same          99.0%          34.80         (0,0.990)         REJECT
Same          99.9%          42.31         (0,0.999)         ACCEPT


NIST is an agency of the U.S. Commerce Department.

Date created: 06/05/2001
Last updated: 10/13/2015