CHI SQUARE TWO SAMPLE

Name:

... CHI SQUARE TWO SAMPLE TEST Type:

Analysis Command Purpose:

Perform a chi-square two sample test that two data samples come from the same distribution. Note that we are not specifying what that common distribution is. Description:

H₀: The two samples come from a common distribution.

H_a: The two samples do not come from a common distribution.

Test Statistic: For the chi-square two sample test, the data is divided into k bins and the test statistic is defined as

\( \chi^{2} = \sum_{i=1}^{k}{\frac{(K_{1}R_{i} - K_{2}S_{i})^{2}} {R_{i} + S_{i}}} \)

where the summation is for bin 1 to k, R_i is the observed frequency for bin i for sample 1, and S_i is the observed frequency for bin i for sample 2. K1 and K2 are scaling constants that are used to adjust for unequal sample sizes. Specifically,

\( K_1 = \sqrt{\frac{\sum_{i=1}^{k}{S_i}} {\sum_{i=1}^{k} {R_i}}} \) \end{document}
\( K_2 = \sqrt{\frac{\sum_{i=1}^{k}{R_i}} {\sum_{i=1}^{k} {S_i}}} \)

This test is sensitive to the choice of bins. Most reasonable choices should produce similar, but not identical, results.

Significance Level: \( \alpha \)

Critical Region: The test statistic follows, approximately, a chi-square distribution with (k - c) degrees of freedom where k is the number of non-empty bins and c = 1 if the sample sizes are equal and c = 0 if they are not equal.
Therefore, the hypothesis that the distribution is from the specified distribution is rejected if

C > CHSPPF(1-\( \alpha \),k-c)

where CHSPPF is the chi-square percent point function with k - c degrees of freedom and a significance level of \( \alpha \).

Dataplot supports the chi-square two sample test for either binned or unbinned data.

For unbinned data, Dataplot automatically generates binned data using the same rule as for histograms. That is, the class width is 0.3*s where s is the sample standard deviation. The upper and lower limits are the mean plus or minus 6 times the sample standard deviation (any zero frequency bins in the tails are omitted). Note that the binning computations are performed with the combined data set. As with the HISTOGRAM command, you can override these defaults using the CLASS WIDTH, CLASS UPPER, and CLASS LOWER commands.

The quantile-quantile plot, bihistogram, and Tukey mean-difference plot are graphical alternatives.

Syntax 1:

This syntax is used for unbinned data.

Syntax 2:

This syntax is used for binned data. The and variables contain bin frequencies and contains the bin midpoints.

Examples:

Note:

STATVAL	-	value of the chi-square two sample test statistic
STATNU	-	degrees of freedom for the chi-square two sample test
STATCDF	-	cdf value for the chi-square two sample test statistic
CUTUPP90	-	90% critical value (alpha = 0.10) for the chi-square two sample test statistic
CUTUPP95	-	95% critical value (alpha = 0.05) for the chi-square two sample test statistic
CUTUPP99	-	99% critical value (alpha = 0.01) for the chi-square two sample test statistic

Note:

In addition to the above LET command, built-in statistics are supported for about 20+ different commands (enter HELP STATISTICS for details).

Default:

None Synonyms:

The word TEST in the above commands is optional.

Related Commands:

GOODNESS OF FIT TEST	= Perform goodness of fit tests.
KOLMOGOROV SMIRNOV TWO SAMPLE TEST	= Perform Kolmogorov-Smirnov two sample test.
BIHISTOGRAM	= Generates a bihistogram.
QUANTILE-QUANTILE PLOT	= Generates a quantile-quantile plot.
TUKEY MEAN DIFFERENCE PLOT	= Generates a Tukey mean difference plot.

Reference:

Numerical Recipes in Fortan: The Art of Scientific Computing

Applications:

Distributional Analysis Implementation Date:

1998/12 Program:

 
SKIP 25
READ AUTO83B.DAT Y1 Y2
DELETE Y2 SUBSET Y2 < 0
.
SET WRITE DECIMALS 5
CHI SQUARE TWO SAMPLE TEST Y1 Y2

             Chi-Square Two Sample Test
  
 First Response Variable:  Y1
 Second Response Variable: Y2
  
 H0: The Two Samples Come From the
     Same (Unspecified) Distribution
 Ha: The Two Samples Come From
     Different Distributions
  
 Sample One Summary Statistics:
 Number of Observations:                             249
 Sample Mean:                                   20.14457
 Sample Standard Deviation:                      6.41469
 Sample Minimum:                                 9.00000
 Sample Maximum:                                39.00000
  
 Sample Two Summary Statistics:
 Number of Observations:                              79
 Sample Mean:                                   30.48101
 Sample Standard Deviation:                      6.10771
 Sample Minimum:                                18.00000
 Sample Maximum:                                47.00000
  
 Number of Non-Empty Cells:                           18
 Class Width For Bins:                           1.83231
 Lower Class Limit:                            -38.48819
 Upper Class Limit:                             58.63277
  
 Chi-Squared Test Statistic:                    35.65477
 Degrees of Freedom:                                  18
 CDF of Test Statistic:                          0.99219
 P-Value:                                        0.00780
  
  
             Conclusions (Upper 1-Tailed Test)
  
 ---------------------------------------------------------------------------
                                              Null Hypothesis           Null
         Null     Confidence       Critical        Acceptance     Hypothesis
   Hypothesis          Level      Value (>)          Interval     Conclusion
 ---------------------------------------------------------------------------
         Same          50.0%          17.33         (0,0.500)         REJECT
         Same          80.0%          22.75         (0,0.800)         REJECT
         Same          90.0%          25.98         (0,0.900)         REJECT
         Same          95.0%          28.86         (0,0.950)         REJECT
         Same          97.5%          31.52         (0,0.975)         REJECT
         Same          99.0%          34.80         (0,0.990)         REJECT
         Same          99.9%          42.31         (0,0.999)         ACCEPT