SED navigation bar go to SED home page go to Dataplot home page go to NIST home page SED Home Page SED Staff SED Projects SED Products and Publications Search SED Pages
Dataplot Vol 2 Vol 1

K SAMPLE PERMUATION TEST

Name:
    K SAMPLE <STATISTIC> PERMUATION TEST
Type:
    Analysis Command
Purpose:
    Perform a k-sample permutation test for a specified statistic.
Description:
    Given random variables Y and X, where Y is a response variable and X is a group-id variable, with sample size n, k-sample permutation tests are performed as follows

    1. Compute the desired statistic for the original data.

    2. Generate a permutation of the response data. Then compute the desired statistic for the permutation.

    3. Repeat step 3 NITER number of times.

    The NITER computed statistics represent the reference distribution. The statistic for the original data is compared to this reference distribution. For example, the cut-offs for a two-sided 95% test are obtained from the 2.5% and 97.5% percentiles of the reference distribution.

    The permutation test is based on all possible permutations of the data. However, the number of permutations grows rapidly as the sample size increases. sampling a subset of all possible permutations provides a reasonable approximation for the permutation test. By default, Dataplot generates 4,000 iterations. To change this, enter the command

      SET PERMUTATION TEST SAMPLE SIZE <value>

    If <value> is less than 100, it will be set to 100. If <value> is greater than 100,000, it will be set to 100,000.

    The specified statistic should be one that can be computed from a single response variable with a corresponding group-id variable.

    This test is most commonly used with F statistic obtained from a one way analysis of variance.

    Permutation tests assume the observations are independent. However, no distributional assumptions are made about the response variable.

Syntax:
    <LOWER TAILED/UPPER TAILED> K SAMPLE PERMUATION TEST <y> <x>
                            <SUBSET/EXCEPT/FOR qualification>
    where <stat> is the desired statistic;
                <y> is the response variable;
                <x> is the group-id variable;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    If LOWER TAILED is specified, a lower tailed test is performed. If UPPER TAILED is specified, an upper tailed test is performed. If neither LOWER TAILED or UPPER TAILED is specified, a two-tailed test is performed.

Examples:
    K SAMPLE ONE WAY ANOVA F STATISTIC PERMUATION TEST Y X
    UPPER TAILED K SAMPLE ONE WAY ANOVA F STATISTIC PERMUATION TEST Y X
    UPPER TAILED K SAMPLE KRUSKAL WALLIS TEST PERMUATION TEST Y X
Note:
    This test only works for statistics based on a single response variable and a group-id variable. Currently, the following statistics are supported

      ONE WAY ANOVA F STATISTIC
      ONE WAY ANOVA SUM OF SQUARES TOTAL
      ONE WAY ANOVA SUM OF SQUARES TREATEMENT
      ONE WAY ANOVA SUM OF SQUARES ERROR
      ONE WAY ANOVA MEAN SQUARE ERROR
      ONE WAY ANOVA MEAN SQUARE TREATMENT
      KRUSKAL WALLIS TEST
      REPEATABILITY STANDARD DEVIATION
      REPRODUCIBILITY STANDARD DEVIATION
      ANDERSON DARLING K SAMPLE TEST
      COCHRAN VARIANCE OUTLIER TEST
      COCHRAN MINIMUM VARIANCE OUTLIER TEST
      SQUARED RANKS TEST
      MEDIAN TEST

    Of these, the ONE WAY ANOVA F STATISTIC and KRUSKAL WALLIS TEST statisics are probably the ones of most interest.

Note:
    This routine uses a random permutation algorithm suggested by Knuth. Specifically, it adapts the RANDPERM routine of Knoble.
Note:
    The following parameters are saved after the k sample permutation test is performed.

      STATVAL - value of the test statistic
      STATCDF - CDF of the test statistic
      PVALUE - p-value of the two tailed test statistic
      PVALUELT - p-value of the lower tailed test statistic
      PVALUEUT - p-value of the upper tailed test statistic
      P80 - 80% upper critical value
      P90 - 90% upper critical value
      P95 - 95% upper critical value
      P975 - 97.5% upper critical value
      P99 - 99% upper critical value
      P995 - 99.5% upper critical value
      P999 - 99.9% upper critical value
      P20 - 20% lower critical value
      P10 - 10% lower critical value
      P05 - 5% lower critical value
      P025 - 2.5% lower critical value
      P01 - 1% lower critical value
      P005 - 0.5% lower critical value
      P001 - 0.1% lower critical value
Note:
    To generate multiple comparisons for the ONE WAY F STATISTIC case, you can perform a two sample permutation test for the pairwise factor levels. This is demonstrated in the program example below.

    Note that although this example compares differences of means, you could use other location statistics such as the MEDIAN or BIWEIGHT LOCATION.

Default:
    The number of permutations defaults to 4,000.
Synonyms:
    None
Related Commands: References:
    Knuth (1998), "The Art of Computer Programming: Volume 2 Seminumerical Algorithms, Third Edition", Section 3.4.2, Addison-Wesley.

    Knoble RANDPERM algorithm downloaded from: "http://coding.derkeiler.com/Archive/Fortran/comp.lang.fortran/ 2006-03/msg00748.html"

    Higgins (2004), "Introduction to Modern Nonparametric Statistics," Duxbury Press, Chapter 3.

Applications:
    K Sample Analysis
Implementation Date:
    2023/09:
Program 1:
     
    set permutation test sample size 5000
    set random number generator fibbonacci congruential
    seed 88807
    .
    . Step 1:   Create the data (from Higgins, p. 85)
    .
    read x y
     1    6.08
     1   22.29
     1    7.51
     1   34.36
     1   23.68
     2   30.45
     2   22.71
     2   44.52
     2   31.47
     2   36.81
     3   32.04
     3   28.03
     3   32.74
     3   23.84
     3   29.64
    end of data
    .
    . Step 2:   Perform the permutation test
    .
    upper tailed k sample one way anova f statistic permutation test y x
        
    The following output is generated
                 K-Sample Permutation Test
                   ONE WAY ANOVA F-VALUE
      
     Response Variable:  Y
     Group-ID Variable:  X
      
      
     Test:
     Number of Permutation Samples:                     5000
     Statistic Value:                                3.78144
     Test CDF Value:                                 0.95040
     Test P-Value:                                   0.04960
      
      
                 Conclusions (Upper 1-Tailed Test)
      
     ------------------------------------------------------------
                                                             Null
        Significance           Test       Critical     Hypothesis
               Level      Statistic    Region (>=)     Conclusion
     ------------------------------------------------------------
               80.0%        3.78144        1.85538         REJECT
               90.0%        3.78144        2.78072         REJECT
               95.0%        3.78144        3.77866         REJECT
               99.0%        3.78144        6.03793         ACCEPT
        
    .
    .           Step 3: Plot the results
    .
    title offset 7
    title case asis
    label case asis
    y1label Count
    x1label One Way Anova F-Statistic for Permutations
    let statval = round(statval,4)
    let p95  = round(p95,3)
    let p99  = round(p99,3)
    let pval = round(pvalueut,4)
    let statcdf = round(statcdf,4)
    .
    x2label color red
    x2label One Way Anova F-Statistic for Original Sample: ^statval
    x3label color blue
    x3label 95 Percentile: ^P95, 99 Percentile: ^P99
    xlimits -5.0 10.0
    let niter = 5000
    skip 1
    read dpst1f.dat z
    title Histogram of One Way Anova F Statistic for ^niter Permutationscr() ...
          (Pvalue: ^pval, CDF: ^statcdf)
    .
    histogram z
    .
    line color red
    line dash
    line thickness 0.3
    drawdsds statval 20 statval 90
    line thickness 0.1
    line color blue
    line dash
    drawdsds p95 20 p95 90
    drawdsds p99 20 p99 90
        
     
    .
    .           Step 4: Multiple comparisons
    .
    let xdist = distinct x
    let ndist = size xdist
    let icnt = 0
    if ndist >= 3
       loop for k = 1 1 ndist
           let xval1 = xdist(k)
           let jstrt = k + 1
           loop for j = jstrt 1 ndist
               let xval2 = xdist(j)
               let ytemp1 = y
               let ytemp2 = y
               retain ytemp1 subset x = xval1
               retain ytemp2 subset x = xval2
               two sample mean permutation test ytemp1 ytemp2
               let icnt = icnt + 1
               let group1(icnt) = xval1
               let group2(icnt) = xval2
               let pvalmc(icnt)   = pvalue2t
               delete ytemp1 ytemp2
           end of loop
       end of loop
    end of if
    write1 ksamp_mc.out "   Group-ID One   Group-ID Two         P-Value"
    write1 ksamp_mc.out "----------------------------------------------"
    write1 ksamp_mc.out group1 group2 pvalmc
        
    The file "ksamp_mc.out" contains
       Group-ID One   Group-ID Two         P-Value
    ----------------------------------------------
             1.00000        2.00000        0.05840
             1.00000        3.00000        0.11320
             2.00000        3.00000        0.36960
        
Program 2:
     
    set permutation test sample size 5000
    set random number generator fibbonacci congruential
    seed 49217
    .
    . Step 1:   Create the data (from Higgins, p. 85)
    .
    read x y
     1    6.08
     1   22.29
     1    7.51
     1   34.36
     1   23.68
     2   30.45
     2   22.71
     2   44.52
     2   31.47
     2   36.81
     3   32.04
     3   28.03
     3   32.74
     3   23.84
     3   29.64
    end of data
    .
    . Step 2:   Perform the permutation test
    .
    echo on
    upper tailed k sample one way anova f statistic permutation test y x
    upper tailed k sample kruskal wallis test permutation test y x
    kruskal wallis y x
    upper tailed k sample squared ranks test permutation test y x
    squared ranks y x
    upper tailed k sample anderson darling k sample test permutation test y x
    anderson darling k sample test y x
    upper tailed k sample cochran variance outlier test permutation test y x
    cochran variance outlier test y x
    upper tailed k sample median test permutation test y x
    median test y x
    echo off
        
    The following output is generated
           ****************************************************************************
           **  upper tailed k sample one way anova f statistic permutation test y x  **
           ****************************************************************************
      
      
                 K-Sample Permutation Test
                   ONE WAY ANOVA F-VALUE
      
     Response Variable:  Y
     Group-ID Variable:  X
      
      
     Test:
     Number of Permutation Samples:                     5000
     Statistic Value:                                3.78144
     Test CDF Value:                                 0.94720
     Test P-Value:                                   0.05280
      
      
                 Conclusions (Upper 1-Tailed Test)
      
     ------------------------------------------------------------
                                                             Null
        Significance           Test       Critical     Hypothesis
               Level      Statistic    Region (>=)     Conclusion
     ------------------------------------------------------------
               80.0%        3.78144        1.92732         REJECT
               90.0%        3.78144        2.90249         REJECT
               95.0%        3.78144        3.89597         ACCEPT
               99.0%        3.78144        6.12665         ACCEPT
      
      
           **********************************************************************
           **  upper tailed k sample kruskal wallis test permutation test y x  **
           **********************************************************************
      
      
                 K-Sample Permutation Test
                   KRUSKALL WALLIS TEST
      
     Response Variable:  Y
     Group-ID Variable:  X
      
      
     Test:
     Number of Permutation Samples:                     5000
     Statistic Value:                                4.16000
     Test CDF Value:                                 0.86820
     Test P-Value:                                   0.12500
      
      
                 Conclusions (Upper 1-Tailed Test)
      
     ------------------------------------------------------------
                                                             Null
        Significance           Test       Critical     Hypothesis
               Level      Statistic    Region (>=)     Conclusion
     ------------------------------------------------------------
               80.0%        4.16000        3.42000         REJECT
               90.0%        4.16000        4.56000         ACCEPT
               95.0%        4.16000        5.82000         ACCEPT
               99.0%        4.16000        8.00000         ACCEPT
      
      
           **************************
           **  kruskal wallis y x  **
           **************************
      
      
                 Kruskal-Wallis One Factor Test
      
     Response Variable: Y
     Group-ID Variable: X
      
     H0: Samples Come From Identical Populations
     Ha: Samples Do Not Come From Identical Populations
      
     Summary Statistics:
     Total Number of Observations:                                  15
     Number of Groups:                                               3
      
     Kruskal-Wallis Test Statistic Value:                      4.16000
     CDF of Test Statistic:                                    0.87507
     P-Value:                                                  0.12493
      
      
     Percent Points of the Chi-Square Reference Distribution
     -----------------------------------
       Percent Point               Value
     -----------------------------------
                 0.0    =          0.000
                50.0    =          1.386
                75.0    =          2.773
                90.0    =          4.605
                95.0    =          5.991
                97.5    =          7.378
                99.0    =          9.210
                99.9    =         13.816
      
     Conclusions (Upper 1-Tailed Test)
     ----------------------------------------------
       Alpha    CDF   Critical Value     Conclusion
     ----------------------------------------------
         10%    90%            4.605      Accept H0
          5%    95%            5.991      Accept H0
        2.5%  97.5%            7.378      Accept H0
          1%    99%            9.210      Accept H0
      
      
                 Multiple Comparisons Table
      
     ---------------------------------------------------------------------------------------
         I    J  |Ri/Ni - Rj/Nj|         90% CV         95% CV         99% CV        P-VALUE
     ---------------------------------------------------------------------------------------
         1    2          5.60000        4.56488        5.58048        7.82344        0.00006
         1    3          4.00000        4.56488        5.58048        7.82344        0.00088
         2    3          1.60000        4.56488        5.58048        7.82344        0.06779
      
      
           *********************************************************************
           **  upper tailed k sample squared ranks test permutation test y x  **
           *********************************************************************
      
      
                 K-Sample Permutation Test
                     SQUARED RANK TEST
      
     Response Variable:  Y
     Group-ID Variable:  X
      
      
     Test:
     Number of Permutation Samples:                     5000
     Statistic Value:                                5.23351
     Test CDF Value:                                 0.77720
     Test P-Value:                                   0.22280
      
      
                 Conclusions (Upper 1-Tailed Test)
      
     ------------------------------------------------------------
                                                             Null
        Significance           Test       Critical     Hypothesis
               Level      Statistic    Region (>=)     Conclusion
     ------------------------------------------------------------
               80.0%        5.23351        5.48205         ACCEPT
               90.0%        5.23351        6.52571         ACCEPT
               95.0%        5.23351        7.77241         ACCEPT
               99.0%        5.23351        9.57074         ACCEPT
      
      
           *************************
           **  squared ranks y x  **
           *************************
      
      
                 Squared Ranks Test
      
     Response Variable: Y
     Group-ID Variable: X
      
     H0: Samples Have Equal Variability
     Ha: Samples Do Not Have Equal Variability
      
     Summary Statistics:
     Total Number of Observations:                         15
     Number of Groups:                                      3
      
     Squared Ranks Test Statistic Value:              5.23351
     CDF of Test Statistic:                           0.92696
     P-Value:                                         0.07304
      
      
     Percent Points of the Chi-Square Reference Distribution
     -----------------------------------
       Percent Point               Value
     -----------------------------------
                 0.0    =          0.000
                50.0    =          1.386
                75.0    =          2.773
                90.0    =          4.605
                95.0    =          5.991
                97.5    =          7.378
                99.0    =          9.210
                99.9    =         13.816
      
                 Upper-Tailed Test: Chi-Square Approximation
      
     H0: Variances Are Equal; Ha: Variance Are Not Equal
     ------------------------------------------------------------
                                                             Null
        Significance           Test       Critical     Hypothesis
               Level      Statistic      Value (>)     Conclusion
     ------------------------------------------------------------
               80.0%        5.23351        3.21888         REJECT
               90.0%        5.23351        4.60517         REJECT
               95.0%        5.23351        5.99146         ACCEPT
               99.0%        5.23351        9.21034         ACCEPT
      
      
                 Multiple Comparisons Table
      
     ---------------------------------------------------------------------------------------
         I    J  |Si/Ni - Sj/Nj|         90% CV         95% CV         99% CV        P-Value
     ---------------------------------------------------------------------------------------
         1    2         63.20000      116.14987      171.14898      394.78593        0.25304
         1    3        105.80000      116.14987      171.14898      394.78593        0.11705
         2    3         42.60000      116.14987      171.14898      394.78593        0.39629
      
      
           *********************************************************************************
           **  upper tailed k sample anderson darling k sample test permutation test y x  **
           *********************************************************************************
      
      
                 K-Sample Permutation Test
                 ANDERSON DARLING K-SAMPLE TEST
      
     Response Variable:  Y
     Group-ID Variable:  X
      
      
     Test:
     Number of Permutation Samples:                     5000
     Statistic Value:                                1.76560
     Test CDF Value:                                 0.92440
     Test P-Value:                                   0.07560
      
      
                 Conclusions (Upper 1-Tailed Test)
      
     ------------------------------------------------------------
                                                             Null
        Significance           Test       Critical     Hypothesis
               Level      Statistic    Region (>=)     Conclusion
     ------------------------------------------------------------
               80.0%        1.76560        1.34619         REJECT
               90.0%        1.76560        1.66064         REJECT
               95.0%        1.76560        1.94778         ACCEPT
               99.0%        1.76560        2.58359         ACCEPT
      
      
           ******************************************
           **  anderson darling k sample test y x  **
           ******************************************
      
      
                 Anderson-Darling K-Sample Test for Common Groups
      
     Response Variable: Y
     Group-ID Variable: X
      
     H0: The Groups Are Homogeneous
     Ha: The Groups Are Not Homogeneous
      
     Summary Statistics:
     Total Number of Observations:                        15
     Number of Groups:                                     3
     Minimum Batch Size:                                   5
     Maximum Batch Size:                                   5
      
     Test Statistic Value:                           1.76560
     Test Statistic Standard Error:                  0.45946
      
      
                 Conclusions (Upper 1-Tailed Test)
      
     ------------------------------------------------------------------------
                                                                         Null
             Null   Significance           Test       Critical     Hypothesis
       Hypothesis          Level      Statistic    Region (>=)     Conclusion
     ------------------------------------------------------------------------
      Homogeneous          50.0%        1.76560        1.13711         REJECT
      Homogeneous          75.0%        1.76560        1.44702         REJECT
      Homogeneous          90.0%        1.76560        1.72594         REJECT
      Homogeneous          95.0%        1.76560        1.89286         ACCEPT
      Homogeneous          97.5%        1.76560        2.03764         ACCEPT
      Homogeneous          99.0%        1.76560        2.20598         ACCEPT
      Homogeneous          99.9%        1.76560        2.55696         ACCEPT
      
      
           ********************************************************************************
           **  upper tailed k sample cochran variance outlier test permutation test y x  **
           ********************************************************************************
      
      
                 K-Sample Permutation Test
                 COCHRAN VARIANCE OUTLIER TEST
      
     Response Variable:  Y
     Group-ID Variable:  X
      
      
     Test:
     Number of Permutation Samples:                     5000
     Statistic Value:                                0.64473
     Test CDF Value:                                 0.79260
     Test P-Value:                                   0.20740
      
      
                 Conclusions (Upper 1-Tailed Test)
      
     ------------------------------------------------------------
                                                             Null
        Significance           Test       Critical     Hypothesis
               Level      Statistic    Region (>=)     Conclusion
     ------------------------------------------------------------
               80.0%        0.64473        0.64761         ACCEPT
               90.0%        0.64473        0.69848         ACCEPT
               95.0%        0.64473        0.82783         ACCEPT
               99.0%        0.64473        0.88012         ACCEPT
      
      
           *****************************************
           **  cochran variance outlier test y x  **
           *****************************************
      
      
                 Cochran Variance Outlier Test
      
     Response Variable: Y
     Group-ID Variable: X
      
     H0: Largest Variance is Not an Outlier
     Ha: Largest Variance is an Outlier
      
     Summary Statistics:
     Total Number of Observations:                        15
     Number of Groups:                                     3
     Number of Groups with Positive Variance:              3
     Group with Largest Variance:                          1
     Largest Variance:                             141.84233
     Sum of Variance:                              880.01148
      
     Cochran Test Statistic Value:                   0.64473
     CDF of Test Statistic:                          0.82896
     P-Value:                                        0.17104
      
      
     Percent Points of the Reference Distribution
     -----------------------------------
       Percent Point               Value
     -----------------------------------
                 0.1    =        0.40230
                 0.5    =        0.40308
                 1.0    =        0.40405
                 2.5    =        0.40698
                 5.0    =        0.41192
                10.0    =        0.42201
                25.0    =        0.45418
                50.0    =        0.51726
                75.0    =        0.60490
                90.0    =        0.69343
                95.0    =        0.74566
                97.5    =        0.78836
                99.0    =        0.83347
                99.5    =        0.86083
                99.9    =        0.90789
      
     Conclusions (Upper 1-Tailed Test)
     ----------------------------------------------
       Alpha    CDF   Critical Value     Conclusion
     ----------------------------------------------
         10%    90%          0.69343      Accept H0
          5%    95%          0.74566      Accept H0
        2.5%  97.5%          0.78836      Accept H0
          1%    99%          0.83347      Accept H0
      
      
           **************************************************************
           **  upper tailed k sample median test permutation test y x  **
           **************************************************************
      
      
                 K-Sample Permutation Test
                        MEDIAN TEST
      
     Response Variable:  Y
     Group-ID Variable:  X
      
      
     Test:
     Number of Permutation Samples:                     5000
     Statistic Value:                                3.75000
     Test CDF Value:                                 0.70900
     Test P-Value:                                   0.06440
      
      
                 Conclusions (Upper 1-Tailed Test)
      
     ------------------------------------------------------------
                                                             Null
        Significance           Test       Critical     Hypothesis
               Level      Statistic    Region (>=)     Conclusion
     ------------------------------------------------------------
               80.0%        3.75000        3.75000         REJECT
               90.0%        3.75000        3.75000         REJECT
               95.0%        3.75000        6.96429         ACCEPT
               99.0%        3.75000       10.17857         ACCEPT
      
      
           ***********************
           **  median test y x  **
           ***********************
      
      
                 Median Test
      
     Response Variable: Y
     Group-ID Variable: X
     H0: Samples Have Equal Medians
     Ha: At Least Two Samples Have Different Medians
      
     Summary Statistics:
     Original Number of Observations:                            15
     Number of Observations After Omitting
     Groups With Less Than Two Observations:                     15
     Number of Groups:                                            3
     Grand Median:                                               30
     Number of Points > the Grand Median:                         7
     Number of Points <= the Grand Median:                        8
      
     Median Test Statistic Value:                           3.75000
     CDF of Test Statistic:                                 0.84665
     P-Value:                                               0.15335
      
      
     Percent Points of the Chi-Square Reference Distribution
     -----------------------------------
       Percent Point               Value
     -----------------------------------
                 0.0    =          0.000
                50.0    =          1.386
                75.0    =          2.773
                90.0    =          4.605
                95.0    =          5.991
                97.5    =          7.378
                99.0    =          9.210
                99.9    =         13.816
      
                 Upper-Tailed Test: Chi-Square Approximation
      
     H0: Medians Are Equal; Ha: Medians Are Not Equal
     ------------------------------------------------------------
                                                             Null
        Significance           Test       Critical     Hypothesis
               Level      Statistic      Value (>)     Conclusion
     ------------------------------------------------------------
               90.0%        3.75000        4.60517         ACCEPT
               95.0%        3.75000        5.99146         ACCEPT
               97.5%        3.75000        7.37776         ACCEPT
               99.0%        3.75000        9.21034         ACCEPT
               99.9%        3.75000       13.81551         ACCEPT
        
Date created: 09/25/2023
Last updated: 09/25/2023

Please email comments on this WWW page to alan.heckert@nist.gov.