Dataplot Vol 2 Vol 1

# K SAMPLE PERMUATION TEST

Name:
K SAMPLE <STATISTIC> PERMUATION TEST
Type:
Analysis Command
Purpose:
Perform a k-sample permutation test for a specified statistic.
Description:
Given random variables Y and X, where Y is a response variable and X is a group-id variable, with sample size n, k-sample permutation tests are performed as follows

1. Compute the desired statistic for the original data.

2. Generate a permutation of the response data. Then compute the desired statistic for the permutation.

3. Repeat step 3 NITER number of times.

The NITER computed statistics represent the reference distribution. The statistic for the original data is compared to this reference distribution. For example, the cut-offs for a two-sided 95% test are obtained from the 2.5% and 97.5% percentiles of the reference distribution.

The permutation test is based on all possible permutations of the data. However, the number of permutations grows rapidly as the sample size increases. sampling a subset of all possible permutations provides a reasonable approximation for the permutation test. By default, Dataplot generates 4,000 iterations. To change this, enter the command

SET PERMUTATION TEST SAMPLE SIZE <value>

If <value> is less than 100, it will be set to 100. If <value> is greater than 100,000, it will be set to 100,000.

The specified statistic should be one that can be computed from a single response variable with a corresponding group-id variable.

This test is most commonly used with F statistic obtained from a one way analysis of variance.

Permutation tests assume the observations are independent. However, no distributional assumptions are made about the response variable.

Syntax:
<LOWER TAILED/UPPER TAILED> K SAMPLE PERMUATION TEST <y> <x>
<SUBSET/EXCEPT/FOR qualification>
where <stat> is the desired statistic;
<y> is the response variable;
<x> is the group-id variable;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

If LOWER TAILED is specified, a lower tailed test is performed. If UPPER TAILED is specified, an upper tailed test is performed. If neither LOWER TAILED or UPPER TAILED is specified, a two-tailed test is performed.

Examples:
K SAMPLE ONE WAY ANOVA F STATISTIC PERMUATION TEST Y X
UPPER TAILED K SAMPLE ONE WAY ANOVA F STATISTIC PERMUATION TEST Y X
UPPER TAILED K SAMPLE KRUSKAL WALLIS TEST PERMUATION TEST Y X
Note:
This test only works for statistics based on a single response variable and a group-id variable. Currently, the following statistics are supported

ONE WAY ANOVA F STATISTIC
ONE WAY ANOVA SUM OF SQUARES TOTAL
ONE WAY ANOVA SUM OF SQUARES TREATEMENT
ONE WAY ANOVA SUM OF SQUARES ERROR
ONE WAY ANOVA MEAN SQUARE ERROR
ONE WAY ANOVA MEAN SQUARE TREATMENT
KRUSKAL WALLIS TEST
REPEATABILITY STANDARD DEVIATION
REPRODUCIBILITY STANDARD DEVIATION
ANDERSON DARLING K SAMPLE TEST
COCHRAN VARIANCE OUTLIER TEST
COCHRAN MINIMUM VARIANCE OUTLIER TEST
SQUARED RANKS TEST
MEDIAN TEST

Of these, the ONE WAY ANOVA F STATISTIC and KRUSKAL WALLIS TEST statisics are probably the ones of most interest.

Note:
This routine uses a random permutation algorithm suggested by Knuth. Specifically, it adapts the RANDPERM routine of Knoble.
Note:
The following parameters are saved after the k sample permutation test is performed.

 STATVAL - value of the test statistic STATCDF - CDF of the test statistic PVALUE - p-value of the two tailed test statistic PVALUELT - p-value of the lower tailed test statistic PVALUEUT - p-value of the upper tailed test statistic P80 - 80% upper critical value P90 - 90% upper critical value P95 - 95% upper critical value P975 - 97.5% upper critical value P99 - 99% upper critical value P995 - 99.5% upper critical value P999 - 99.9% upper critical value P20 - 20% lower critical value P10 - 10% lower critical value P05 - 5% lower critical value P025 - 2.5% lower critical value P01 - 1% lower critical value P005 - 0.5% lower critical value P001 - 0.1% lower critical value
Note:
To generate multiple comparisons for the ONE WAY F STATISTIC case, you can perform a two sample permutation test for the pairwise factor levels. This is demonstrated in the program example below.

Note that although this example compares differences of means, you could use other location statistics such as the MEDIAN or BIWEIGHT LOCATION.

Default:
The number of permutations defaults to 4,000.
Synonyms:
None
Related Commands:
 TWO SAMPLE PERMUTATION TEST = Perform a 2-sample permutation test. LINEAR RANK SUM TEST = Perform a 2-sample linear rank sum test. ONE WAY ANOVA = = Perform a one-way analysis of variance. KRUSKAL WALLIS TEST = Perform a k-sample Kruskal-Wallis test. MEDIAN TEST = Perform a k-sample medians test. SQUARED RANKS TEST = Perform a k-sample squared ranks test for homogeneous variances.
References:
Knuth (1998), "The Art of Computer Programming: Volume 2 Seminumerical Algorithms, Third Edition", Section 3.4.2, Addison-Wesley.

Knoble RANDPERM algorithm downloaded from: "http://coding.derkeiler.com/Archive/Fortran/comp.lang.fortran/ 2006-03/msg00748.html"

Higgins (2004), "Introduction to Modern Nonparametric Statistics," Duxbury Press, Chapter 3.

Applications:
K Sample Analysis
Implementation Date:
2023/09:
Program 1:

set permutation test sample size 5000
set random number generator fibbonacci congruential
seed 88807
.
. Step 1:   Create the data (from Higgins, p. 85)
.
1    6.08
1   22.29
1    7.51
1   34.36
1   23.68
2   30.45
2   22.71
2   44.52
2   31.47
2   36.81
3   32.04
3   28.03
3   32.74
3   23.84
3   29.64
end of data
.
. Step 2:   Perform the permutation test
.
upper tailed k sample one way anova f statistic permutation test y x

The following output is generated
             K-Sample Permutation Test
ONE WAY ANOVA F-VALUE

Response Variable:  Y
Group-ID Variable:  X

Test:
Number of Permutation Samples:                     5000
Statistic Value:                                3.78144
Test CDF Value:                                 0.95040
Test P-Value:                                   0.04960

Conclusions (Upper 1-Tailed Test)

------------------------------------------------------------
Null
Significance           Test       Critical     Hypothesis
Level      Statistic    Region (>=)     Conclusion
------------------------------------------------------------
80.0%        3.78144        1.85538         REJECT
90.0%        3.78144        2.78072         REJECT
95.0%        3.78144        3.77866         REJECT
99.0%        3.78144        6.03793         ACCEPT

.
.           Step 3: Plot the results
.
title offset 7
title case asis
label case asis
y1label Count
x1label One Way Anova F-Statistic for Permutations
let statval = round(statval,4)
let p95  = round(p95,3)
let p99  = round(p99,3)
let pval = round(pvalueut,4)
let statcdf = round(statcdf,4)
.
x2label color red
x2label One Way Anova F-Statistic for Original Sample: ^statval
x3label color blue
x3label 95 Percentile: ^P95, 99 Percentile: ^P99
xlimits -5.0 10.0
let niter = 5000
skip 1
title Histogram of One Way Anova F Statistic for ^niter Permutationscr() ...
(Pvalue: ^pval, CDF: ^statcdf)
.
histogram z
.
line color red
line dash
line thickness 0.3
drawdsds statval 20 statval 90
line thickness 0.1
line color blue
line dash
drawdsds p95 20 p95 90
drawdsds p99 20 p99 90


.
.           Step 4: Multiple comparisons
.
let xdist = distinct x
let ndist = size xdist
let icnt = 0
if ndist >= 3
loop for k = 1 1 ndist
let xval1 = xdist(k)
let jstrt = k + 1
loop for j = jstrt 1 ndist
let xval2 = xdist(j)
let ytemp1 = y
let ytemp2 = y
retain ytemp1 subset x = xval1
retain ytemp2 subset x = xval2
two sample mean permutation test ytemp1 ytemp2
let icnt = icnt + 1
let group1(icnt) = xval1
let group2(icnt) = xval2
let pvalmc(icnt)   = pvalue2t
delete ytemp1 ytemp2
end of loop
end of loop
end of if
write1 ksamp_mc.out "   Group-ID One   Group-ID Two         P-Value"
write1 ksamp_mc.out "----------------------------------------------"
write1 ksamp_mc.out group1 group2 pvalmc

The file "ksamp_mc.out" contains
   Group-ID One   Group-ID Two         P-Value
----------------------------------------------
1.00000        2.00000        0.05840
1.00000        3.00000        0.11320
2.00000        3.00000        0.36960

Program 2:

set permutation test sample size 5000
set random number generator fibbonacci congruential
seed 49217
.
. Step 1:   Create the data (from Higgins, p. 85)
.
1    6.08
1   22.29
1    7.51
1   34.36
1   23.68
2   30.45
2   22.71
2   44.52
2   31.47
2   36.81
3   32.04
3   28.03
3   32.74
3   23.84
3   29.64
end of data
.
. Step 2:   Perform the permutation test
.
echo on
upper tailed k sample one way anova f statistic permutation test y x
upper tailed k sample kruskal wallis test permutation test y x
kruskal wallis y x
upper tailed k sample squared ranks test permutation test y x
squared ranks y x
upper tailed k sample anderson darling k sample test permutation test y x
anderson darling k sample test y x
upper tailed k sample cochran variance outlier test permutation test y x
cochran variance outlier test y x
upper tailed k sample median test permutation test y x
median test y x
echo off

The following output is generated
       ****************************************************************************
**  upper tailed k sample one way anova f statistic permutation test y x  **
****************************************************************************

K-Sample Permutation Test
ONE WAY ANOVA F-VALUE

Response Variable:  Y
Group-ID Variable:  X

Test:
Number of Permutation Samples:                     5000
Statistic Value:                                3.78144
Test CDF Value:                                 0.94720
Test P-Value:                                   0.05280

Conclusions (Upper 1-Tailed Test)

------------------------------------------------------------
Null
Significance           Test       Critical     Hypothesis
Level      Statistic    Region (>=)     Conclusion
------------------------------------------------------------
80.0%        3.78144        1.92732         REJECT
90.0%        3.78144        2.90249         REJECT
95.0%        3.78144        3.89597         ACCEPT
99.0%        3.78144        6.12665         ACCEPT

**********************************************************************
**  upper tailed k sample kruskal wallis test permutation test y x  **
**********************************************************************

K-Sample Permutation Test
KRUSKALL WALLIS TEST

Response Variable:  Y
Group-ID Variable:  X

Test:
Number of Permutation Samples:                     5000
Statistic Value:                                4.16000
Test CDF Value:                                 0.86820
Test P-Value:                                   0.12500

Conclusions (Upper 1-Tailed Test)

------------------------------------------------------------
Null
Significance           Test       Critical     Hypothesis
Level      Statistic    Region (>=)     Conclusion
------------------------------------------------------------
80.0%        4.16000        3.42000         REJECT
90.0%        4.16000        4.56000         ACCEPT
95.0%        4.16000        5.82000         ACCEPT
99.0%        4.16000        8.00000         ACCEPT

**************************
**  kruskal wallis y x  **
**************************

Kruskal-Wallis One Factor Test

Response Variable: Y
Group-ID Variable: X

H0: Samples Come From Identical Populations
Ha: Samples Do Not Come From Identical Populations

Summary Statistics:
Total Number of Observations:                                  15
Number of Groups:                                               3

Kruskal-Wallis Test Statistic Value:                      4.16000
CDF of Test Statistic:                                    0.87507
P-Value:                                                  0.12493

Percent Points of the Chi-Square Reference Distribution
-----------------------------------
Percent Point               Value
-----------------------------------
0.0    =          0.000
50.0    =          1.386
75.0    =          2.773
90.0    =          4.605
95.0    =          5.991
97.5    =          7.378
99.0    =          9.210
99.9    =         13.816

Conclusions (Upper 1-Tailed Test)
----------------------------------------------
Alpha    CDF   Critical Value     Conclusion
----------------------------------------------
10%    90%            4.605      Accept H0
5%    95%            5.991      Accept H0
2.5%  97.5%            7.378      Accept H0
1%    99%            9.210      Accept H0

Multiple Comparisons Table

---------------------------------------------------------------------------------------
I    J  |Ri/Ni - Rj/Nj|         90% CV         95% CV         99% CV        P-VALUE
---------------------------------------------------------------------------------------
1    2          5.60000        4.56488        5.58048        7.82344        0.00006
1    3          4.00000        4.56488        5.58048        7.82344        0.00088
2    3          1.60000        4.56488        5.58048        7.82344        0.06779

*********************************************************************
**  upper tailed k sample squared ranks test permutation test y x  **
*********************************************************************

K-Sample Permutation Test
SQUARED RANK TEST

Response Variable:  Y
Group-ID Variable:  X

Test:
Number of Permutation Samples:                     5000
Statistic Value:                                5.23351
Test CDF Value:                                 0.77720
Test P-Value:                                   0.22280

Conclusions (Upper 1-Tailed Test)

------------------------------------------------------------
Null
Significance           Test       Critical     Hypothesis
Level      Statistic    Region (>=)     Conclusion
------------------------------------------------------------
80.0%        5.23351        5.48205         ACCEPT
90.0%        5.23351        6.52571         ACCEPT
95.0%        5.23351        7.77241         ACCEPT
99.0%        5.23351        9.57074         ACCEPT

*************************
**  squared ranks y x  **
*************************

Squared Ranks Test

Response Variable: Y
Group-ID Variable: X

H0: Samples Have Equal Variability
Ha: Samples Do Not Have Equal Variability

Summary Statistics:
Total Number of Observations:                         15
Number of Groups:                                      3

Squared Ranks Test Statistic Value:              5.23351
CDF of Test Statistic:                           0.92696
P-Value:                                         0.07304

Percent Points of the Chi-Square Reference Distribution
-----------------------------------
Percent Point               Value
-----------------------------------
0.0    =          0.000
50.0    =          1.386
75.0    =          2.773
90.0    =          4.605
95.0    =          5.991
97.5    =          7.378
99.0    =          9.210
99.9    =         13.816

Upper-Tailed Test: Chi-Square Approximation

H0: Variances Are Equal; Ha: Variance Are Not Equal
------------------------------------------------------------
Null
Significance           Test       Critical     Hypothesis
Level      Statistic      Value (>)     Conclusion
------------------------------------------------------------
80.0%        5.23351        3.21888         REJECT
90.0%        5.23351        4.60517         REJECT
95.0%        5.23351        5.99146         ACCEPT
99.0%        5.23351        9.21034         ACCEPT

Multiple Comparisons Table

---------------------------------------------------------------------------------------
I    J  |Si/Ni - Sj/Nj|         90% CV         95% CV         99% CV        P-Value
---------------------------------------------------------------------------------------
1    2         63.20000      116.14987      171.14898      394.78593        0.25304
1    3        105.80000      116.14987      171.14898      394.78593        0.11705
2    3         42.60000      116.14987      171.14898      394.78593        0.39629

*********************************************************************************
**  upper tailed k sample anderson darling k sample test permutation test y x  **
*********************************************************************************

K-Sample Permutation Test
ANDERSON DARLING K-SAMPLE TEST

Response Variable:  Y
Group-ID Variable:  X

Test:
Number of Permutation Samples:                     5000
Statistic Value:                                1.76560
Test CDF Value:                                 0.92440
Test P-Value:                                   0.07560

Conclusions (Upper 1-Tailed Test)

------------------------------------------------------------
Null
Significance           Test       Critical     Hypothesis
Level      Statistic    Region (>=)     Conclusion
------------------------------------------------------------
80.0%        1.76560        1.34619         REJECT
90.0%        1.76560        1.66064         REJECT
95.0%        1.76560        1.94778         ACCEPT
99.0%        1.76560        2.58359         ACCEPT

******************************************
**  anderson darling k sample test y x  **
******************************************

Anderson-Darling K-Sample Test for Common Groups

Response Variable: Y
Group-ID Variable: X

H0: The Groups Are Homogeneous
Ha: The Groups Are Not Homogeneous

Summary Statistics:
Total Number of Observations:                        15
Number of Groups:                                     3
Minimum Batch Size:                                   5
Maximum Batch Size:                                   5

Test Statistic Value:                           1.76560
Test Statistic Standard Error:                  0.45946

Conclusions (Upper 1-Tailed Test)

------------------------------------------------------------------------
Null
Null   Significance           Test       Critical     Hypothesis
Hypothesis          Level      Statistic    Region (>=)     Conclusion
------------------------------------------------------------------------
Homogeneous          50.0%        1.76560        1.13711         REJECT
Homogeneous          75.0%        1.76560        1.44702         REJECT
Homogeneous          90.0%        1.76560        1.72594         REJECT
Homogeneous          95.0%        1.76560        1.89286         ACCEPT
Homogeneous          97.5%        1.76560        2.03764         ACCEPT
Homogeneous          99.0%        1.76560        2.20598         ACCEPT
Homogeneous          99.9%        1.76560        2.55696         ACCEPT

********************************************************************************
**  upper tailed k sample cochran variance outlier test permutation test y x  **
********************************************************************************

K-Sample Permutation Test
COCHRAN VARIANCE OUTLIER TEST

Response Variable:  Y
Group-ID Variable:  X

Test:
Number of Permutation Samples:                     5000
Statistic Value:                                0.64473
Test CDF Value:                                 0.79260
Test P-Value:                                   0.20740

Conclusions (Upper 1-Tailed Test)

------------------------------------------------------------
Null
Significance           Test       Critical     Hypothesis
Level      Statistic    Region (>=)     Conclusion
------------------------------------------------------------
80.0%        0.64473        0.64761         ACCEPT
90.0%        0.64473        0.69848         ACCEPT
95.0%        0.64473        0.82783         ACCEPT
99.0%        0.64473        0.88012         ACCEPT

*****************************************
**  cochran variance outlier test y x  **
*****************************************

Cochran Variance Outlier Test

Response Variable: Y
Group-ID Variable: X

H0: Largest Variance is Not an Outlier
Ha: Largest Variance is an Outlier

Summary Statistics:
Total Number of Observations:                        15
Number of Groups:                                     3
Number of Groups with Positive Variance:              3
Group with Largest Variance:                          1
Largest Variance:                             141.84233
Sum of Variance:                              880.01148

Cochran Test Statistic Value:                   0.64473
CDF of Test Statistic:                          0.82896
P-Value:                                        0.17104

Percent Points of the Reference Distribution
-----------------------------------
Percent Point               Value
-----------------------------------
0.1    =        0.40230
0.5    =        0.40308
1.0    =        0.40405
2.5    =        0.40698
5.0    =        0.41192
10.0    =        0.42201
25.0    =        0.45418
50.0    =        0.51726
75.0    =        0.60490
90.0    =        0.69343
95.0    =        0.74566
97.5    =        0.78836
99.0    =        0.83347
99.5    =        0.86083
99.9    =        0.90789

Conclusions (Upper 1-Tailed Test)
----------------------------------------------
Alpha    CDF   Critical Value     Conclusion
----------------------------------------------
10%    90%          0.69343      Accept H0
5%    95%          0.74566      Accept H0
2.5%  97.5%          0.78836      Accept H0
1%    99%          0.83347      Accept H0

**************************************************************
**  upper tailed k sample median test permutation test y x  **
**************************************************************

K-Sample Permutation Test
MEDIAN TEST

Response Variable:  Y
Group-ID Variable:  X

Test:
Number of Permutation Samples:                     5000
Statistic Value:                                3.75000
Test CDF Value:                                 0.70900
Test P-Value:                                   0.06440

Conclusions (Upper 1-Tailed Test)

------------------------------------------------------------
Null
Significance           Test       Critical     Hypothesis
Level      Statistic    Region (>=)     Conclusion
------------------------------------------------------------
80.0%        3.75000        3.75000         REJECT
90.0%        3.75000        3.75000         REJECT
95.0%        3.75000        6.96429         ACCEPT
99.0%        3.75000       10.17857         ACCEPT

***********************
**  median test y x  **
***********************

Median Test

Response Variable: Y
Group-ID Variable: X
H0: Samples Have Equal Medians
Ha: At Least Two Samples Have Different Medians

Summary Statistics:
Original Number of Observations:                            15
Number of Observations After Omitting
Groups With Less Than Two Observations:                     15
Number of Groups:                                            3
Grand Median:                                               30
Number of Points > the Grand Median:                         7
Number of Points <= the Grand Median:                        8

Median Test Statistic Value:                           3.75000
CDF of Test Statistic:                                 0.84665
P-Value:                                               0.15335

Percent Points of the Chi-Square Reference Distribution
-----------------------------------
Percent Point               Value
-----------------------------------
0.0    =          0.000
50.0    =          1.386
75.0    =          2.773
90.0    =          4.605
95.0    =          5.991
97.5    =          7.378
99.0    =          9.210
99.9    =         13.816

Upper-Tailed Test: Chi-Square Approximation

H0: Medians Are Equal; Ha: Medians Are Not Equal
------------------------------------------------------------
Null
Significance           Test       Critical     Hypothesis
Level      Statistic      Value (>)     Conclusion
------------------------------------------------------------
90.0%        3.75000        4.60517         ACCEPT
95.0%        3.75000        5.99146         ACCEPT
97.5%        3.75000        7.37776         ACCEPT
99.0%        3.75000        9.21034         ACCEPT
99.9%        3.75000       13.81551         ACCEPT

Date created: 09/25/2023
Last updated: 09/25/2023

Please email comments on this WWW page to alan.heckert@nist.gov.