Dataplot Vol 2 Vol 1

# TWO SAMPLE PERMUATION TEST

Name:
TWO SAMPLE <STATISTIC> PERMUATION TEST
Type:
Analysis Command
Purpose:
Perform a two sample permutation test for a specified statistic.
Description:
Given random variables Y1 and Y2 with sample sizes n1 and n2, respectiively, permutation tests are performed as follows

1. Compute the desired statistic for the original data.

2. Combine the 2 data sets into a single data set.

3. Generate a permutation of the combined data. Then compute the desired statistic (the first n1 permuted values constitute the first response variable and the following n2 permuted values constitute the second response variable).

4. Repeat step 3 NITER number of times.

The NITER computed statistics represent the reference distribution. The statistic for the original data is compared to this reference distribution. For example, the cut-offs for a 95% two-sided test are obtained from the 2.5% and 97.5% percentiles of the reference distribution.

The permutation test is based on all possible permutations of the data. However, the number of permutations ((n1+n2)!/(n1!n2!)) grows rapidly as the sample sizes increase. However, sampling a subset of all possible permutations provides a reasonable approximation for the permutation test. By default, Dataplot generates 4,000 iterations. To change this, enter the command

SET PERMUTATION TEST SAMPLE SIZE

If <value> is less than 100, it will be set to 100. If <value> is greater than 100,000, it will be set to 100,000.

The specified statistic should be one that can be computed from a single response variable (e.g., MEAN, MEDIAN, VARIANCE). By default, Dataplot will compute the difference of the statistic between the two samples. For scale statistics (e.g., STANDARD DEVIATION, VARIANCE), it is often preferred to compute the ratio rather than the difference. To specify the ratio be computed, enter

SET PERMUTATION TEST RATIO

To reset the default, enter

SET PERMUTATION TEST DIFFERENCE

Permutation tests assume the observations are independent. However, no distributional assumptions are made about the response variables.

Syntax 1:
<LOWER TAILED/UPPER TAILED> TWO SAMPLE PERMUATION TEST
<y1> <y2>             <SUBSET/EXCEPT/FOR qualification>
where <stat> is the desired statistic;
<y1> is the first response variable;
<y2> is the second response variable;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

If LOWER TAILED is specified, a lower tailed test is performed. If UPPER TAILED is specified, an upper tailed test is performed. If neither LOWER TAILED or UPPER TAILED is specified, a two-tailed test is performed.

To see a list of supported statistics, enter HELP STATISTICS.

Syntax 2:
<LOWER TAILED/UPPER TAILED> TWO SAMPLE <stat>
PERMUATION TEST <y1> ... <yk>
<SUBSET/EXCEPT/FOR qualification>
where <stat> is the desired statistic;
<y1> ... <yk> is a list of two or more response variables;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

This syntax performs all the two-way two sample permutation tests for the listed variables. This syntax supports the TO syntax.

If LOWER TAILED is specified, a lower tailed test is performed. If UPPER TAILED is specified, an upper tailed test is performed. If neither LOWER TAILED or UPPER TAILED is specified, a two-tailed test is performed.

To see a list of supported statistics, enter HELP STATISTICS.

Examples:
TWO SAMPLE MEAN PERMUATION TEST Y1 Y2
TWO SAMPLE MEDIAN PERMUATION TEST Y1 Y2
TWO SAMPLE MEDIAN PERMUATION TEST Y1 Y2 SUBSET Y2 > 0
LOWER TAILED TWO SAMPLE MEDIAN PERMUATION TEST Y1 Y2
UPPER TAILED TWO SAMPLE MEDIAN PERMUATION TEST Y1 Y2

SET PERMUTATION TEST RATIO
TWO SAMPLE STANDARD DEVIATION PERMUATION TEST Y1 Y2

Note:
This routine uses a random permutation algorithm suggested by Knuth. Specifically, it adapts the RANDPERM routine of Knoble.
Note:
The following parameters are saved after the two sample permutation test is performed.

 STATVAL - value of the test statistic STATCDF - CDF of the test statistic PVALUE - p-value of the two tailed test statistic PVALUELT - p-value of the lower tailed test statistic PVALUEUT - p-value of the upper tailed test statistic P80 - 80% upper critical value P90 - 90% upper critical value P95 - 95% upper critical value P975 - 97.5% upper critical value P99 - 99% upper critical value P995 - 99.5% upper critical value P999 - 99.9% upper critical value P20 - 20% lower critical value P10 - 10% lower critical value P05 - 5% lower critical value P025 - 2.5% lower critical value P01 - 1% lower critical value P005 - 0.5% lower critical value P001 - 0.1% lower critical value
Default:
The difference (or the ratio) of the statistic for the two samples will generated for 4,000 permutations.
Synonyms:
2 SAMPLE is a synonym for TWO SAMPLE
Related Commands:
 LINEAR RANK SUM TEST = Perform a 2-sample linear rank sum test. T TEST = Perform a 2-sample t test. RANK SUM TEST = Perform a 2-sample rank sum test for location MEDIAN TEST = Perform a k-sample medians test VAN DER WAERDEN TEST = Perform a k-sample Van Der Waerden test SIEGEL TUKEY TEST = Perform a 2-sample Siegel Tukey test SQUARED RANKS TEST = Perform a k-sample squared ranks test for homogeneous variances. KLOTZ TEST = Perform a k-sample Klotz test for homogenuous variances. BIHISTOGRAM = Generate a bihistogram. QUANTILE QUANTILE PLOT = Generate a quantile-quantile plot.
References:
Higgins (2004), "Introduction to Modern Nonparametric Statistics," Duxbury Press, Chapter 2.

Knuth (1998), "The Art of Computer Programming: Volume 2 Seminumerical Algorithms, Third Edition", Section 3.4.2, Addison-Wesley.

Applications:
Two Sample Analysis
Implementation Date:
2023/08:
Program:

set random number generator fibbonacci congruential
seed 32119
.
.
skip 25
retain y2 subset y2 >= 0
.
.           Perform the permutation test
.
lower tailed two sample mean permutation test              y1 y2
upper tailed two sample mean permutation test              y1 y2
two sample mean permutation test                           y1 y2
.
.           Plot the results
.
title offset 7
title case asis
label case asis
y1label Count
x1label Difference of Means for Permutations
let statval = round(statval,3)
let p025 = round(p025,3)
let p975 = round(p975,3)
let pval = round(pvalue2t,3)
let statcdf = round(statcdf,3)
.
x2label color red
x2label Difference of Means for Original Sample: ^statval
x3label color blue
x3label 2.5 Percentile: ^P025, 97.5 Percentile: ^P975
xlimits -0.5 0.5
let niter = 4000
skip 1
title Histogram of Difference of Means for ^niter Permutationscr() ...
(Pvalue: ^pval, CDF: ^statcdf)
.
histogram z
.
line color red
line dash
drawdsds statval 20 statval 90
line color blue
line dash
drawdsds p025 20 p025 90
drawdsds p975 20 p975 90

The following output is generated
             Two Sample Permutation Test (Difference)
MEAN

First Response Variable:  Y1
Second Response Variable: Y2

H0: Difference = 0
Ha: Difference < 0

Sample One Summary Statistics:
Number of Observations:                             249
Sample Mean:                                   20.14458
Sample Median:                                 19.00000
Sample Standard Deviation:                      6.41470

Sample Two Summary Statistics:
Number of Observations:                              79
Sample Mean:                                   30.48101
Sample Median:                                 32.00000
Sample Standard Deviation:                      6.10771

Test:
Number of Permutation Samples:                     4000
Statistic Value:                              -10.33643
Test CDF Value:                                 0.00000
Test P-Value:                                   0.00000

Conclusions (Lower 1-Tailed Test)

------------------------------------------------------------
Null
Significance           Test       Critical     Hypothesis
Level      Statistic    Region (<=)     Conclusion
------------------------------------------------------------
80.0%      -10.33643       -0.83209         REJECT
90.0%      -10.33643       -1.29897         REJECT
95.0%      -10.33643       -1.63245         REJECT
99.0%      -10.33643       -2.38263         REJECT

Two Sample Permutation Test (Difference)
MEAN

First Response Variable:  Y1
Second Response Variable: Y2

H0: Difference = 0
Ha: Difference > 0

Sample One Summary Statistics:
Number of Observations:                             249
Sample Mean:                                   20.14458
Sample Median:                                 19.00000
Sample Standard Deviation:                      6.41470

Sample Two Summary Statistics:
Number of Observations:                              79
Sample Mean:                                   30.48101
Sample Median:                                 32.00000
Sample Standard Deviation:                      6.10771

Test:
Number of Permutation Samples:                     4000
Statistic Value:                              -10.33643
Test CDF Value:                                 0.00000
Test P-Value:                                   1.00000

Conclusions (Upper 1-Tailed Test)

------------------------------------------------------------
Null
Significance           Test       Critical     Hypothesis
Level      Statistic    Region (>=)     Conclusion
------------------------------------------------------------
80.0%      -10.33643        0.85202         ACCEPT
90.0%      -10.33643        1.30055         ACCEPT
95.0%      -10.33643        1.65238         ACCEPT
99.0%      -10.33643        2.36938         ACCEPT

Two Sample Permutation Test (Difference)
MEAN

First Response Variable:  Y1
Second Response Variable: Y2

H0: Difference = 0
Ha: Difference not equal 0

Sample One Summary Statistics:
Number of Observations:                             249
Sample Mean:                                   20.14458
Sample Median:                                 19.00000
Sample Standard Deviation:                      6.41470

Sample Two Summary Statistics:
Number of Observations:                              79
Sample Mean:                                   30.48101
Sample Median:                                 32.00000
Sample Standard Deviation:                      6.10771

Test:
Number of Permutation Samples:                     4000
Statistic Value:                              -10.33643
Test CDF Value:                                 0.00000
Test P-Value:                                   0.00000

Conclusions (Two-Tailed Test)

---------------------------------------------------------------------------
Null
Significance           Test       Critical       Critical     Hypothesis
Level      Statistic    Region (<=)    Region (>=)     Conclusion
---------------------------------------------------------------------------
80.0%      -10.33643       -1.33232        1.28555         REJECT
90.0%      -10.33643       -1.69915        1.63487         REJECT
95.0%      -10.33643       -1.99929        1.90250         REJECT
99.0%      -10.33643       -2.64950        2.60265         REJECT

Date created: 08/04/2023
Last updated: 09/25/2023