FISHER TWO SAMPLE RANDOMIZATION TEST

Name:

FISHER TWO SAMPLE RANDOMIZATION TEST Type:

Analysis Command Purpose:

Perform a Fisher two sample randomization test for the equality of the means of two independent samples. Description:

the samples are randomly selected from infinite populations (equivalently the observations are independent)
the samples come from normal populations
the two populations have equal variances

Randomization tests can be used when these assumptions are questionable. Fisher introduced randomization tests (also referred to as permutation tests) in 1935.

The randomization test for the equality of the means for two samples is computed as follows:

Given that sample one has n1 observations and sample two has n2 observations, randomly assign the n1 + n2 observations so that n1 observations are assigned to sample one and n2 observations and compute the difference of the means. This is a single permuation for the test.
Generate all possible permutations of the n1 + n2 observations and compute the difference of the means for each permutation. The number of permutations is \( \left( \begin{array}{c} n1 \\ n2 \end{array} \right) = \frac{n1!}{n2!(n1-n2)!} \). Call this value NTOTAL for subsequent steps.
Let DFULL denote the difference of the means for the original samples. Let D_i denote the difference of the means for the i-th sample. Then the following p-values can be computed

The primary drawback to this test is that NTOTAL grows rapidly as n1 and n2 increase. A test based on the full set of permutations may be computationaly prohibitive except for relatively small samples. For larger n1 and n2, one approach is to generate a random subset of the complete set of permutations (typically on the order of 4,000 to 10,000 random subsets will be generated).

For this command, Dataplot is using the algorithm of Richards and Byrd. This algorithm generates the complete set of permutations. The advantage of this algorithm is that exact p-values are obtained for one-tailed tests and also for two-tailed tests when n1 = n2. If n1 is not equal n2, an approximate p-value is obtained for the two-tailed test. The primary drawback is that this test is limited to small sample sizes. Dataplot currently limits the maximum value of n1 and n2 to be 22. See the Note section below for some guidance to generating this test for larger samples based on randomly sampling the permutations.

If the two samples are not randomly drawn from larger populations, the inference will be valid for the observations under study but not necesarily for the populations from which the observations are drawn.

Syntax 1:

The <y1> and <y2> need not be the same length.

Either <y1> or <y2> (or both) may be matrix arguments. If a matrix argument is given, the response variable will consist of all observations in that matrix. Although matrix arguments are allowed, they are rarely used for this command due to limitation on the size of the response variable.

Syntax 2:

This syntax will implement all the pairwise Fisher two sample randomization tests for the listed response variables. For example,

FISHER TWO SAMPLE RANDOMIZATION TEST Y1 TO Y4

is equivalent to

The <y1>, ..., <yk> need not be the same length.

Any of the listed response variables may be matrix arguments. If a matrix argument is given, the response variable will consist of all observations in that matrix. Although matrix arguments are allowed, they are rarely used for this command due to limitation on the size of the response variable.

Examples:

Note:

Wilcoxon signed rank test

Mann-Whitney rank sum test

Note:

In addition to the above LET command, built-in statistics are supported for about 20+ different commands (enter HELP STATISTICS for details).

Note:

The SAMPLE RANDOM PERMUTATION command can be used to implement other randomization tests (and to accomodate sample sizes greater than allowed here). The Program 2 and Program 3 examples demonstrate this. Although these examples demonstrate the difference of means statistic, other statistics can be easily substituted into these examples.

Default:

None Synonyms:

None Related Commands:

T-TEST	=	Compute a t-test.
SIGN TEST	=	Compute a sign test.
SIGNED RANK TEST	=	Compute the Wilcoxon signed rank test.
RANK SUM TEST	=	Perform a Mann-Whitney rank sum test.
CHI-SQUARED 2 SAMPLE TEST	=	Compute a two sample chi-square test.
BIHISTOGRAM	=	Generates a bihistogram.
QUANTILE-QUANTILE PLOT	=	Generate a quantile-quantile plot.
BOX PLOT	=	Generates a box plot.

Reference:

Applied Statistics

Fisher (1935), "Design of Experiments", Edinburgh: Oliver and Boyd.

Conover (1999), "Practical Non-Parametric Statistics", Third Edition, Wiley, p. 410.

Higgins (2004), "Introduction to Modern Nonparametric Statistics", Thomson/Brooks/Cole, Duxbury Advanced Series, Chapter 2.

Applications:

Nonparameteric statistics, two sample problem Implementation Date:

2011/06 Program:

 
.  Example from p. 410 of Convover (1999), "Practical Nonparametric
.  Statistics", Third Edition, Wiley.
.
let y1 = data 0 1 1 0 -2
let y2 = data 6 7 7 4 -3 9 14
let y3 = data 9 2 3 5 7
let y4 = data 6 8 9 12 15
set write decimals 5
.
let t    = fisher two sample rand test        y1 y2
let pval = fisher two sample rand test pvalue y1 y2
.
print t pval
.
fisher two sample rand test y1 y2
fisher two sample rand test y1 y2 y3 y4

 
 PARAMETERS AND CONSTANTS--

    T       --        0.00000
    PVAL    --        0.02778
 

            Two Sample Two-Sided Fisher Randomization Test
                        (Independent Samples)
 
First Response Variable: Y1
Second Response Variable: Y2
 
H0: E(X) = E(Y)
Ha: E(X) <> E(Y)
 
Summary Statistics:
Sample with Smaller Mean:
Number of Observations:                  5
Mean:                                    0.00000
Sum of Observations:                     0.00000
Sample with Larger Mean:
Number of Observations:                  7
Mean:                                    6.28571
Sum of Observations:                     44.00000
Difference of Means:                     -6.28571
 
Test Statistic:                          0.00000
Approximate P-Value (two-tailed test):   0.02778
Exact P-Value (lower-tailed test):       0.01389





            Two Sample Two-Sided Fisher Randomization Test
                        (Independent Samples)
 
First Response Variable: Y1
Second Response Variable: Y2
 
H0: E(X) = E(Y)
Ha: E(X) <> E(Y)
 
Summary Statistics:
Sample with Smaller Mean:
Number of Observations:                  5
Mean:                                    0.00000
Sum of Observations:                     0.00000
Sample with Larger Mean:
Number of Observations:                  7
Mean:                                    6.28571
Sum of Observations:                     44.00000
Difference of Means:                     -6.28571
 
Test Statistic:                          0.00000
Approximate P-Value (two-tailed test):   0.02778
Exact P-Value (lower-tailed test):       0.01389
 
 
            Two Sample Two-Sided Fisher Randomization Test
                        (Independent Samples)
 
First Response Variable: Y1
Second Response Variable: Y3
 
H0: E(X) = E(Y)
Ha: E(X) <> E(Y)
 
Summary Statistics:
Sample with Smaller Mean:
Number of Observations:                  5
Mean:                                    0.00000
Sum of Observations:                     0.00000
Sample with Larger Mean:
Number of Observations:                  5
Mean:                                    5.20000
Sum of Observations:                     26.00000
Difference of Means:                     -5.20000
 
Test Statistic:                          0.00000
Approximate P-Value (two-tailed test):   0.00794
Exact P-Value (lower-tailed test):       0.00397
 
 
            Two Sample Two-Sided Fisher Randomization Test
                        (Independent Samples)
 
First Response Variable: Y1
Second Response Variable: Y4
 
H0: E(X) = E(Y)
Ha: E(X) <> E(Y)
 
Summary Statistics:
Sample with Smaller Mean:
Number of Observations:                  5
Mean:                                    0.00000
Sum of Observations:                     0.00000
Sample with Larger Mean:
Number of Observations:                  5
Mean:                                    10.00000
Sum of Observations:                     50.00000
Difference of Means:                     -10.00000
 
Test Statistic:                          0.00000
Approximate P-Value (two-tailed test):   0.00794
Exact P-Value (lower-tailed test):       0.00397
 
 
            Two Sample Two-Sided Fisher Randomization Test
                        (Independent Samples)
 
First Response Variable: Y2
Second Response Variable: Y3
 
H0: E(X) = E(Y)
Ha: E(X) <> E(Y)
 
Summary Statistics:
Sample with Smaller Mean:
Number of Observations:                  5
Mean:                                    6.28571
Sum of Observations:                     26.00000
Sample with Larger Mean:
Number of Observations:                  7
Mean:                                    5.20000
Sum of Observations:                     44.00000
Difference of Means:                     1.08571
 
Test Statistic:                          26.00000
Approximate P-Value (two-tailed test):   0.72222
Exact P-Value (lower-tailed test):       0.36111
 
 
            Two Sample Two-Sided Fisher Randomization Test
                        (Independent Samples)
 
First Response Variable: Y2
Second Response Variable: Y4
 
H0: E(X) = E(Y)
Ha: E(X) <> E(Y)
 
Summary Statistics:
Sample with Smaller Mean:
Number of Observations:                  5
Mean:                                    5.20000
Sum of Observations:                     26.00000
Sample with Larger Mean:
Number of Observations:                  5
Mean:                                    10.00000
Sum of Observations:                     50.00000
Difference of Means:                     -4.80000
 
Test Statistic:                          26.00000
Approximate P-Value (two-tailed test):   0.06349
Exact P-Value (lower-tailed test):       0.03175
 
 
            Two Sample Two-Sided Fisher Randomization Test
                        (Independent Samples)
 
First Response Variable: Y3
Second Response Variable: Y4
 
H0: E(X) = E(Y)
Ha: E(X) <> E(Y)
 
Summary Statistics:
Sample with Smaller Mean:
Number of Observations:                  5
Mean:                                    5.20000
Sum of Observations:                     26.00000
Sample with Larger Mean:
Number of Observations:                  5
Mean:                                    10.00000
Sum of Observations:                     50.00000
Difference of Means:                     -4.80000
 
Test Statistic:                          26.00000
Approximate P-Value (two-tailed test):   0.06349
Exact P-Value (lower-tailed test):       0.03175