SED navigation bar go to SED home page go to Dataplot home page go to NIST home page SED Home Page SED Staff SED Projects SED Products and Publications Search SED Pages
Dataplot Vol 2 Vol 1

TWO SAMPLE LINEAR RANK SUM TEST

Name:
    TWO SAMPLE LINEAR RANK SUM TEST
Type:
    Analysis Command
Purpose:
    Perform a two sample two sample linear rank sum test for various scores.
Description:
    Given two samples, Y1 and Y2, with sample sizes n1 and n2, respectively, combine the two samples into a single sample and determine the ranks of the combined samples.

    Two sample linear rank sum tests are then based on the statistic

      \( S = \sum_{i=1}^{n}{tag_i a(R_i)} \)

    with \( n \) denoting the combined sample size and \( R_i \)) denoting the rank of the i-th observation. The variable tag is an indicator variable that has the value 1 for the observations from the smaller sample size and the value 0 for the observations from the larger sample size (if n1 = n2, tag will be set to 1 for the sample that the first observation comes from). The \( a(R_i) \) is a score function based on the ranks. The supported score functions are described in a Note section below.

    The following test statistic is based on asymptotic normality

      \( z = \frac{S - E_{0}(S)} {SD_{0}} \)

    where

      \( \begin{array}{lcl} E_{0}(S) & = & \mbox{the expected value of } S \mbox{ under the null hypothesis} \\ & = & \frac{n1}{n} \sum_{i=1}^{n}{a(R_i)} \end{array} \)

      \( \begin{array}{lcl} SD_{0}(S) & = & \mbox{the standard deviation of } S \mbox{ under the null hypothesis} \\ & = & \frac{n1 n2}{n(n-1)} \sum_{i=1}^{n} {(a(R_{i}) - \bar{a})^{2}} \end{array} \)

      \( \begin{array}{lcl} \bar{a} & = & \mbox{the average score} \\ & = & \frac{\sum_{i=1}^{n}{a(R_i)}} {n} \end{array} \)

    Note that n1 denotes the sample size for the smaller sample, not necessarily the sample size of Y1.

    Tied ranks use the average rank of the tied values.

Syntax 1:
    <LOWER TAILED/UPPER TAILED> TWO SAMPLE LINEAR RANK SUM TEST
                            <y1> <y2>             <SUBSET/EXCEPT/FOR qualification>
    where <y1> is the first response variable;
                <y2> is the second response variable;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    If LOWER TAILED is specified, a lower tailed test is performed. If UPPER TAILED is specified, an upper tailed test is performed. If neither LOWER TAILED or UPPER TAILED is specified, a two-tailed test is performed.

Syntax 2:
    <LOWER TAILED/UPPER TAILED> TWO SAMPLE LINEAR RANK SUM TEST
                            <y1> ... <yk>             <SUBSET/EXCEPT/FOR qualification>
    where <y1> ... <yk> is a list of two or more response variables;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    This syntax performs all the two-way two sample linear rank sum tests for the listed variables. This syntax supports the TO syntax.

    If LOWER TAILED is specified, a lower tailed test is performed. If UPPER TAILED is specified, an upper tailed test is performed. If neither LOWER TAILED or UPPER TAILED is specified, a two-tailed test is performed.

Examples:
    TWO SAMPLE LINEAR RANK SUM TEST Y1 Y2
    TWO SAMPLE LINEAR RANK SUM TEST Y1 Y2 Y3
    TWO SAMPLE LINEAR RANK SUM TEST Y1 TO Y6
    TWO SAMPLE LINEAR RANK SUM TEST Y1 Y2 SUBSET Y2 > 0
    LOWER TAILED TWO SAMPLE LINEAR RANK SUM TEST Y1 Y2
    UPPER TAILED TWO SAMPLE LINEAR RANK SUM TEST Y1 Y2
Note:
    To specify the scoring function, enter the command

      SET LINEAR RANK SUM TEST SCORE <case>

    where <case> is one of the following

    1. WILCOX

      This option uses Wilcoxon scores

        \( a(R_i) = R_i \)

      That is, the Wilcoxon scores are simply the ranks. Using this score is essentially a rank sum test (also known as the Mann-Whitney test).

      This score is primarily used to test for equal locations.

    2. MEDIAN

      This option uses median scores

        \( \begin{array}{lcl} a(R_i) & = & 1 \hspace{0.5in} \mbox{if } R_i > \frac{n+1}{2} \\ & = & 0 \hspace{0.5in} \mbox{if } R_i \le \frac{n+1}{2} \end{array} \)

      That is, ranks greater than the median rank are scored as a 1 and ranks less than or equal to the median rank are scored as 0. Using this score is essentially a 2-sample median test. Median scores work best for distributions that are symmetric and heavy-tailed.

      This score is primarily used to test for equal locations.

    3. VAN DER WAERDEN

      This option uses the Van Der Waerden scores

        \( a(R_i) = \Phi^{-1}(\frac{R_i}{n+1}) \)

      with \( \Phi^{-1} \) denoting the percent point function of the standard normal distribution. Van Der Waerden scores are the percentiles of a standard normal distribution. Using this score is essentially a 2-sample Van Der Waerden test.

      This score is primarily used to test for equal locations.

    4. SAVAGE

      This option uses the Savage scores

        \( a(R_i) = \sum_{j=1}^{R_i}{\frac{1}{n-j+1}} - 1 \)

      Savage scores are the expected values of exponential order statistics minus 1 (to center the scores around 0). Savage scores are typically used to test location differences in extreme value distributions and to test scale differences in exponential distributions.

    5. MOOD

      This option uses the Mood scores

        \( a(R_i) = (R_i - \frac{n+1}{2})^{2} \)

      Mood scores are the square of the difference between the observation rank and the average rank.

      This score is primarily used to test for equal scales.

    6. ANSARI BRADLEY

      This option uses the Ansari-Bradley scores

        \( a(R_i) = \frac{n+1}{2} + |R_i - \frac{n+1}{2}| \)

      This score is often given in a different form, but the form given here is useful for computational purposes.

      This score is primarily used to test for equal scales.

    7. KLOTZ

      This option uses the Klotz scores

        \( a(R_i) = (\Phi^{-1}(\frac{R_i}{n+1}))^2 \)

      This score is the square of the Van Der Waerden score. Using this score is essentially a 2-sample Klotz test.

      This score is primarily used to test for equal scales.

    8. CONOVER

      This option uses the Conover scores

        \( a(R_i) = (R(U_i))^{2} \)

      where

        \( U_{i} = |Y_{i(j)} - \bar{Y}_{j}| \)

      That is, the Conover scores are the squared ranks of the absolute deviations from the group mean. Using this score is essentially a 2-sample squared ranks test.

      This score is primarily used to test for equal scales.

Note:
    The following parameters are saved after the two sample linear rank test is performed.

      STATVAL - value of the test statistic
      STATCDF - CDF of the test statistic
      PVALUE - p-value of the two tailed test statistic
      PVALUELT - p-value of the lower tailed test statistic
      PVALUEUT - p-value of the upper tailed test statistic

      CUTUPP90 - 90% upper critical value
      CUTUPP95 - 95% upper critical value
      CUTUP975 - 97.5% upper critical value
      CUTUPP99 - 99% upper critical value
      CUTUP995 - 99.5% upper critical value
      CUTUP999 - 99.9% upper critical value

      CUTLOW10 - 10% lower critical value
      CUTLOW05 - 5% lower critical value
      CUTLO025 - 2.5% lower critical value
      CUTLOW01 - 1% lower critical value
      CUTLO005 - 0.5% lower critical value
      CUTLO001 - 0.1% lower critical value
Note:
    In addition to the TWO SAMPLE LINEAR RANK SUM TEST command, the following commands can also be used:

      LET STATVAL = TWO SAMPLE LINEAR RANK SUM TEST Y1 Y2
      LET STATCDF = TWO SAMPLE LINEAR RANK SUM TEST CDF Y1 Y2
      LET PVALUE = TWO SAMPLE LINEAR RANK SUM TEST PVALUE Y1 Y2
      LET PVALUE = TWO SAMPLE LINEAR RANK SUM LOWER TAIL TEST
                              PVALUE Y1 Y2
      LET PVALUE = TWO SAMPLE LINEAR RANK SUM UPPER TAIL TEST
                              PVALUE Y1 Y2

    In addition to the above LET commands, built-in statistics are supported for 30+ different commands (enter HELP STATISTICS for details).

Default:
    The default score function is WILCOX
Synonyms:
    2 SAMPLE is a synonym for TWO SAMPLE
Related Commands: Applications:
    Two Sample Analysis
Implementation Date:
    2023/07:
Program:
     
    . Step 1:   Read the data
    .
    skip 25
    read shoemake.dat y1 y2
    skip 0
    let y x = stack y1 y2
    .
    . Step 2:   Generate the statistics
    .
    set linear rank sum test score van der waerden
    let statval = linear rank sum test                        y1 y2
    let statcdf = linear rank sum test cdf                    y1 y2
    let pvalue  = linear rank sum test pvalue                 y1 y2
    let pvallt  = linear rank sum test lower tail pvalue      y1 y2
    let pvalut  = linear rank sum test upper tail pvalue      y1 y2
    let statval = round(statval,2)
    let statcdf = round(statcdf,2)
    let pvalue  = round(pvalue,2)
    let pvallt  = round(pvallt,2)
    let pvalut  = round(pvalut,2)
    .
    print "Van Der Waerden Scores:"
    print "Test Statistic:                        ^statval"
    print "Test Statistic CDF:                    ^statcdf"
    print "Test Statistic P-Value:                ^pvalue"
    print "Test Statistic Lower Tailed P-Value:   ^pvallt"
    print "Test Statistic Upper Tailed P-Value:   ^pvalut"
    .
    two sample linear rank sum test                y1 y2
    van der waerden test                           y  x
    .
    set linear rank sum test score wilcox
    two sample linear rank sum test                y1 y2
    t test                                         y1 y2
    .
    set linear rank sum test score klotz
    two sample linear rank sum test                y1 y2
    klotz test                                     y1 y2
        
    The following output is generated
    Van Der Waerden Scores:
    Test Statistic:                        1.56
    Test Statistic CDF:                    0.94
    Test Statistic P-Value:                0.12
    Test Statistic Lower Tailed P-Value:   0.94
    Test Statistic Upper Tailed P-Value:   0.06
      
                 Two Sample Two-Sided Linear Rank Sum Test
                         (Van Der Waerden Scores)
      
     First Response Variable: Y1
     Second Response Variable: Y2
      
     H0: Location1 = Location2
     Ha: Location1 not equal Location2
      
     Summary Statistics:
     Number of Observations for Sample 1:                 10
     Mean for Sample 1:                              6.02100
     Median for Sample 1:                            5.53000
     Standard Deviation for Sample 1:                1.58184
     Number of Observations for Sample 2:                 10
     Mean for Sample 2:                              5.01900
     Median for Sample 2:                            5.03500
     Standard Deviation for Sample 2:                1.10440
      
     Test (Normal Approximation):
     Test Statistic Value:                           1.56365
     Score Value:                                    3.11351
     Expected Value of Test Statistic:               0.00786
     Standard Deviation of Test Statistic:           1.98615
     CDF Value:                                      0.94105
     P-Value (2-tailed test):                        0.11790
     P-Value (lower-tailed test):                    0.94105
     P-Value (upper-tailed test):                    0.05895
      
      
                 Two-Tailed Test: Normal Approximation
      
     ---------------------------------------------------------------------------
                                             Lower          Upper           Null
        Significance           Test       Critical       Critical     Hypothesis
               Level      Statistic      Value (<)      Value (>)     Conclusion
     ---------------------------------------------------------------------------
               80.0%        1.56365       -1.28155        1.28155         REJECT
               90.0%        1.56365       -1.64485        1.64485         ACCEPT
               95.0%        1.56365       -1.95996        1.95996         ACCEPT
               99.0%        1.56365       -2.57583        2.57583         ACCEPT
      
      
     THE FORTRAN COMMON CHARACTER VARIABLE LINERANK HAS JUST BEEN SET TO WILC
      
                 Two Sample Two-Sided Linear Rank Sum Test
                             (Wilcoxon Scores
      
     First Response Variable: Y1
     Second Response Variable: Y2
      
     H0: Location1 = Location2
     Ha: Location1 not equal Location2
      
     Summary Statistics:
     Number of Observations for Sample 1:                 10
     Mean for Sample 1:                              6.02100
     Median for Sample 1:                            5.53000
     Standard Deviation for Sample 1:                1.58184
     Number of Observations for Sample 2:                 10
     Mean for Sample 2:                              5.01900
     Median for Sample 2:                            5.03500
     Standard Deviation for Sample 2:                1.10440
      
     Test (Normal Approximation):
     Test Statistic Value:                           1.47628
     Score Value:                                  124.50000
     Expected Value of Test Statistic:             105.00000
     Standard Deviation of Test Statistic:          13.20885
     CDF Value:                                      0.93007
     P-Value (2-tailed test):                        0.13987
     P-Value (lower-tailed test):                    0.93007
     P-Value (upper-tailed test):                    0.06993
      
      
                 Two-Tailed Test: Normal Approximation
      
     ---------------------------------------------------------------------------
                                             Lower          Upper           Null
        Significance           Test       Critical       Critical     Hypothesis
               Level      Statistic      Value (<)      Value (>)     Conclusion
     ---------------------------------------------------------------------------
               80.0%        1.47628       -1.28155        1.28155         REJECT
               90.0%        1.47628       -1.64485        1.64485         ACCEPT
               95.0%        1.47628       -1.95996        1.95996         ACCEPT
               99.0%        1.47628       -2.57583        2.57583         ACCEPT
      
      
     THE FORTRAN COMMON CHARACTER VARIABLE LINERANK HAS JUST BEEN SET TO KLOT
      
                 Two Sample Two-Sided Linear Rank Sum Test
                              (Klotz Scores)
      
     First Response Variable: Y1
     Second Response Variable: Y2
      
     H0: Scale1 = Scale2
     Ha: Scale1 not equal Scale2
      
     Summary Statistics:
     Number of Observations for Sample 1:                 10
     Mean for Sample 1:                              6.02100
     Median for Sample 1:                            5.53000
     Standard Deviation for Sample 1:                1.58184
     Number of Observations for Sample 2:                 10
     Mean for Sample 2:                              5.01900
     Median for Sample 2:                            5.03500
     Standard Deviation for Sample 2:                1.10440
      
     Test (Normal Approximation):
     Test Statistic Value:                           0.26908
     Score Value:                                    8.01749
     Expected Value of Test Statistic:               7.49513
     Standard Deviation of Test Statistic:           1.94130
     CDF Value:                                      0.60606
     P-Value (2-tailed test):                        0.78787
     P-Value (lower-tailed test):                    0.60606
     P-Value (upper-tailed test):                    0.39394
      
      
                 Two-Tailed Test: Normal Approximation
      
     ---------------------------------------------------------------------------
                                             Lower          Upper           Null
        Significance           Test       Critical       Critical     Hypothesis
               Level      Statistic      Value (<)      Value (>)     Conclusion
     ---------------------------------------------------------------------------
               80.0%        0.26908       -1.28155        1.28155         ACCEPT
               90.0%        0.26908       -1.64485        1.64485         ACCEPT
               95.0%        0.26908       -1.95996        1.95996         ACCEPT
               99.0%        0.26908       -2.57583        2.57583         ACCEPT
      
        
Date created: 08/03/2023
Last updated: 08/03/2023

Please email comments on this WWW page to alan.heckert@nist.gov.