Dataplot Vol 2 Vol 1

# TWO SAMPLE LINEAR RANK SUM TEST

Name:
TWO SAMPLE LINEAR RANK SUM TEST
Type:
Analysis Command
Purpose:
Perform a two sample two sample linear rank sum test for various scores.
Description:
Given two samples, Y1 and Y2, with sample sizes n1 and n2, respectively, combine the two samples into a single sample and determine the ranks of the combined samples.

Two sample linear rank sum tests are then based on the statistic

$$S = \sum_{i=1}^{n}{tag_i a(R_i)}$$

with $$n$$ denoting the combined sample size and $$R_i$$) denoting the rank of the i-th observation. The variable tag is an indicator variable that has the value 1 for the observations from the smaller sample size and the value 0 for the observations from the larger sample size (if n1 = n2, tag will be set to 1 for the sample that the first observation comes from). The $$a(R_i)$$ is a score function based on the ranks. The supported score functions are described in a Note section below.

The following test statistic is based on asymptotic normality

$$z = \frac{S - E_{0}(S)} {SD_{0}}$$

where

$$\begin{array}{lcl} E_{0}(S) & = & \mbox{the expected value of } S \mbox{ under the null hypothesis} \\ & = & \frac{n1}{n} \sum_{i=1}^{n}{a(R_i)} \end{array}$$

$$\begin{array}{lcl} SD_{0}(S) & = & \mbox{the standard deviation of } S \mbox{ under the null hypothesis} \\ & = & \frac{n1 n2}{n(n-1)} \sum_{i=1}^{n} {(a(R_{i}) - \bar{a})^{2}} \end{array}$$

$$\begin{array}{lcl} \bar{a} & = & \mbox{the average score} \\ & = & \frac{\sum_{i=1}^{n}{a(R_i)}} {n} \end{array}$$

Note that n1 denotes the sample size for the smaller sample, not necessarily the sample size of Y1.

Tied ranks use the average rank of the tied values.

Syntax 1:
<LOWER TAILED/UPPER TAILED> TWO SAMPLE LINEAR RANK SUM TEST
<y1> <y2>             <SUBSET/EXCEPT/FOR qualification>
where <y1> is the first response variable;
<y2> is the second response variable;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

If LOWER TAILED is specified, a lower tailed test is performed. If UPPER TAILED is specified, an upper tailed test is performed. If neither LOWER TAILED or UPPER TAILED is specified, a two-tailed test is performed.

Syntax 2:
<LOWER TAILED/UPPER TAILED> TWO SAMPLE LINEAR RANK SUM TEST
<y1> ... <yk>             <SUBSET/EXCEPT/FOR qualification>
where <y1> ... <yk> is a list of two or more response variables;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

This syntax performs all the two-way two sample linear rank sum tests for the listed variables. This syntax supports the TO syntax.

If LOWER TAILED is specified, a lower tailed test is performed. If UPPER TAILED is specified, an upper tailed test is performed. If neither LOWER TAILED or UPPER TAILED is specified, a two-tailed test is performed.

Examples:
TWO SAMPLE LINEAR RANK SUM TEST Y1 Y2
TWO SAMPLE LINEAR RANK SUM TEST Y1 Y2 Y3
TWO SAMPLE LINEAR RANK SUM TEST Y1 TO Y6
TWO SAMPLE LINEAR RANK SUM TEST Y1 Y2 SUBSET Y2 > 0
LOWER TAILED TWO SAMPLE LINEAR RANK SUM TEST Y1 Y2
UPPER TAILED TWO SAMPLE LINEAR RANK SUM TEST Y1 Y2
Note:
To specify the scoring function, enter the command

SET LINEAR RANK SUM TEST SCORE <case>

where <case> is one of the following

1. WILCOX

This option uses Wilcoxon scores

$$a(R_i) = R_i$$

That is, the Wilcoxon scores are simply the ranks. Using this score is essentially a rank sum test (also known as the Mann-Whitney test).

This score is primarily used to test for equal locations.

2. MEDIAN

This option uses median scores

$$\begin{array}{lcl} a(R_i) & = & 1 \hspace{0.5in} \mbox{if } R_i > \frac{n+1}{2} \\ & = & 0 \hspace{0.5in} \mbox{if } R_i \le \frac{n+1}{2} \end{array}$$

That is, ranks greater than the median rank are scored as a 1 and ranks less than or equal to the median rank are scored as 0. Using this score is essentially a 2-sample median test. Median scores work best for distributions that are symmetric and heavy-tailed.

This score is primarily used to test for equal locations.

3. VAN DER WAERDEN

This option uses the Van Der Waerden scores

$$a(R_i) = \Phi^{-1}(\frac{R_i}{n+1})$$

with $$\Phi^{-1}$$ denoting the percent point function of the standard normal distribution. Van Der Waerden scores are the percentiles of a standard normal distribution. Using this score is essentially a 2-sample Van Der Waerden test.

This score is primarily used to test for equal locations.

4. SAVAGE

This option uses the Savage scores

$$a(R_i) = \sum_{j=1}^{R_i}{\frac{1}{n-j+1}} - 1$$

Savage scores are the expected values of exponential order statistics minus 1 (to center the scores around 0). Savage scores are typically used to test location differences in extreme value distributions and to test scale differences in exponential distributions.

5. MOOD

This option uses the Mood scores

$$a(R_i) = (R_i - \frac{n+1}{2})^{2}$$

Mood scores are the square of the difference between the observation rank and the average rank.

This score is primarily used to test for equal scales.

This option uses the Ansari-Bradley scores

$$a(R_i) = \frac{n+1}{2} + |R_i - \frac{n+1}{2}|$$

This score is often given in a different form, but the form given here is useful for computational purposes.

This score is primarily used to test for equal scales.

7. KLOTZ

This option uses the Klotz scores

$$a(R_i) = (\Phi^{-1}(\frac{R_i}{n+1}))^2$$

This score is the square of the Van Der Waerden score. Using this score is essentially a 2-sample Klotz test.

This score is primarily used to test for equal scales.

8. CONOVER

This option uses the Conover scores

$$a(R_i) = (R(U_i))^{2}$$

where

$$U_{i} = |Y_{i(j)} - \bar{Y}_{j}|$$

That is, the Conover scores are the squared ranks of the absolute deviations from the group mean. Using this score is essentially a 2-sample squared ranks test.

This score is primarily used to test for equal scales.

Note:
The following parameters are saved after the two sample linear rank test is performed.

 STATVAL - value of the test statistic STATCDF - CDF of the test statistic PVALUE - p-value of the two tailed test statistic PVALUELT - p-value of the lower tailed test statistic PVALUEUT - p-value of the upper tailed test statistic CUTUPP90 - 90% upper critical value CUTUPP95 - 95% upper critical value CUTUP975 - 97.5% upper critical value CUTUPP99 - 99% upper critical value CUTUP995 - 99.5% upper critical value CUTUP999 - 99.9% upper critical value CUTLOW10 - 10% lower critical value CUTLOW05 - 5% lower critical value CUTLO025 - 2.5% lower critical value CUTLOW01 - 1% lower critical value CUTLO005 - 0.5% lower critical value CUTLO001 - 0.1% lower critical value
Note:
In addition to the TWO SAMPLE LINEAR RANK SUM TEST command, the following commands can also be used:

LET STATVAL = TWO SAMPLE LINEAR RANK SUM TEST Y1 Y2
LET STATCDF = TWO SAMPLE LINEAR RANK SUM TEST CDF Y1 Y2
LET PVALUE = TWO SAMPLE LINEAR RANK SUM TEST PVALUE Y1 Y2
LET PVALUE = TWO SAMPLE LINEAR RANK SUM LOWER TAIL TEST
PVALUE Y1 Y2
LET PVALUE = TWO SAMPLE LINEAR RANK SUM UPPER TAIL TEST
PVALUE Y1 Y2

In addition to the above LET commands, built-in statistics are supported for 30+ different commands (enter HELP STATISTICS for details).

Default:
The default score function is WILCOX
Synonyms:
2 SAMPLE is a synonym for TWO SAMPLE
Related Commands:
 RANK SUM TEST = Perform a 2-sample rank sum test for location MEDIAN TEST = Perform a k-sample medians test VAN DER WAERDEN TEST = Perform a k-sample Van Der Waerden test SIEGEL TUKEY TEST = Perform a 2-sample Siegel Tukey test SQUARED RANKS TEST = Perform a k-sample squared ranks test for homogeneous variances. KLOTZ TEST = Perform a k-sample Klotz test for homogeneous variances.
Applications:
Two Sample Analysis
Implementation Date:
2023/07:
Program:

. Step 1:   Read the data
.
skip 25
skip 0
let y x = stack y1 y2
.
. Step 2:   Generate the statistics
.
set linear rank sum test score van der waerden
let statval = linear rank sum test                        y1 y2
let statcdf = linear rank sum test cdf                    y1 y2
let pvalue  = linear rank sum test pvalue                 y1 y2
let pvallt  = linear rank sum test lower tail pvalue      y1 y2
let pvalut  = linear rank sum test upper tail pvalue      y1 y2
let statval = round(statval,2)
let statcdf = round(statcdf,2)
let pvalue  = round(pvalue,2)
let pvallt  = round(pvallt,2)
let pvalut  = round(pvalut,2)
.
print "Van Der Waerden Scores:"
print "Test Statistic:                        ^statval"
print "Test Statistic CDF:                    ^statcdf"
print "Test Statistic P-Value:                ^pvalue"
print "Test Statistic Lower Tailed P-Value:   ^pvallt"
print "Test Statistic Upper Tailed P-Value:   ^pvalut"
.
two sample linear rank sum test                y1 y2
van der waerden test                           y  x
.
set linear rank sum test score wilcox
two sample linear rank sum test                y1 y2
t test                                         y1 y2
.
set linear rank sum test score klotz
two sample linear rank sum test                y1 y2
klotz test                                     y1 y2

The following output is generated
Van Der Waerden Scores:
Test Statistic:                        1.56
Test Statistic CDF:                    0.94
Test Statistic P-Value:                0.12
Test Statistic Lower Tailed P-Value:   0.94
Test Statistic Upper Tailed P-Value:   0.06

Two Sample Two-Sided Linear Rank Sum Test
(Van Der Waerden Scores)

First Response Variable: Y1
Second Response Variable: Y2

H0: Location1 = Location2
Ha: Location1 not equal Location2

Summary Statistics:
Number of Observations for Sample 1:                 10
Mean for Sample 1:                              6.02100
Median for Sample 1:                            5.53000
Standard Deviation for Sample 1:                1.58184
Number of Observations for Sample 2:                 10
Mean for Sample 2:                              5.01900
Median for Sample 2:                            5.03500
Standard Deviation for Sample 2:                1.10440

Test (Normal Approximation):
Test Statistic Value:                           1.56365
Score Value:                                    3.11351
Expected Value of Test Statistic:               0.00786
Standard Deviation of Test Statistic:           1.98615
CDF Value:                                      0.94105
P-Value (2-tailed test):                        0.11790
P-Value (lower-tailed test):                    0.94105
P-Value (upper-tailed test):                    0.05895

Two-Tailed Test: Normal Approximation

---------------------------------------------------------------------------
Lower          Upper           Null
Significance           Test       Critical       Critical     Hypothesis
Level      Statistic      Value (<)      Value (>)     Conclusion
---------------------------------------------------------------------------
80.0%        1.56365       -1.28155        1.28155         REJECT
90.0%        1.56365       -1.64485        1.64485         ACCEPT
95.0%        1.56365       -1.95996        1.95996         ACCEPT
99.0%        1.56365       -2.57583        2.57583         ACCEPT

THE FORTRAN COMMON CHARACTER VARIABLE LINERANK HAS JUST BEEN SET TO WILC

Two Sample Two-Sided Linear Rank Sum Test
(Wilcoxon Scores

First Response Variable: Y1
Second Response Variable: Y2

H0: Location1 = Location2
Ha: Location1 not equal Location2

Summary Statistics:
Number of Observations for Sample 1:                 10
Mean for Sample 1:                              6.02100
Median for Sample 1:                            5.53000
Standard Deviation for Sample 1:                1.58184
Number of Observations for Sample 2:                 10
Mean for Sample 2:                              5.01900
Median for Sample 2:                            5.03500
Standard Deviation for Sample 2:                1.10440

Test (Normal Approximation):
Test Statistic Value:                           1.47628
Score Value:                                  124.50000
Expected Value of Test Statistic:             105.00000
Standard Deviation of Test Statistic:          13.20885
CDF Value:                                      0.93007
P-Value (2-tailed test):                        0.13987
P-Value (lower-tailed test):                    0.93007
P-Value (upper-tailed test):                    0.06993

Two-Tailed Test: Normal Approximation

---------------------------------------------------------------------------
Lower          Upper           Null
Significance           Test       Critical       Critical     Hypothesis
Level      Statistic      Value (<)      Value (>)     Conclusion
---------------------------------------------------------------------------
80.0%        1.47628       -1.28155        1.28155         REJECT
90.0%        1.47628       -1.64485        1.64485         ACCEPT
95.0%        1.47628       -1.95996        1.95996         ACCEPT
99.0%        1.47628       -2.57583        2.57583         ACCEPT

THE FORTRAN COMMON CHARACTER VARIABLE LINERANK HAS JUST BEEN SET TO KLOT

Two Sample Two-Sided Linear Rank Sum Test
(Klotz Scores)

First Response Variable: Y1
Second Response Variable: Y2

H0: Scale1 = Scale2
Ha: Scale1 not equal Scale2

Summary Statistics:
Number of Observations for Sample 1:                 10
Mean for Sample 1:                              6.02100
Median for Sample 1:                            5.53000
Standard Deviation for Sample 1:                1.58184
Number of Observations for Sample 2:                 10
Mean for Sample 2:                              5.01900
Median for Sample 2:                            5.03500
Standard Deviation for Sample 2:                1.10440

Test (Normal Approximation):
Test Statistic Value:                           0.26908
Score Value:                                    8.01749
Expected Value of Test Statistic:               7.49513
Standard Deviation of Test Statistic:           1.94130
CDF Value:                                      0.60606
P-Value (2-tailed test):                        0.78787
P-Value (lower-tailed test):                    0.60606
P-Value (upper-tailed test):                    0.39394

Two-Tailed Test: Normal Approximation

---------------------------------------------------------------------------
Lower          Upper           Null
Significance           Test       Critical       Critical     Hypothesis
Level      Statistic      Value (<)      Value (>)     Conclusion
---------------------------------------------------------------------------
80.0%        0.26908       -1.28155        1.28155         ACCEPT
90.0%        0.26908       -1.64485        1.64485         ACCEPT
95.0%        0.26908       -1.95996        1.95996         ACCEPT
99.0%        0.26908       -2.57583        2.57583         ACCEPT


Date created: 08/03/2023
Last updated: 08/03/2023