SED navigation bar go to SED home page go to Dataplot home page go to NIST home page SED Home Page SED Staff SED Projects SED Products and Publications Search SED Pages
Dataplot Vol 1 Vol 2

KRUSKAL WALLIS

Name:
    KRUSKAL WALLIS
Type:
    Analysis Command
Purpose:
    Perform a Kruskal Wallis test that k samples come from identical populations.
Description:
    Analysis of Variance (ANOVA) is a data analysis technique for examining the significance of the factors (= independent variables) in a multi-factor model. The one factor model can be thought of as a generalization of the two sample t-test. That is, the two sample t-test is a test of the hypothesis that two population means are equal. The one factor ANOVA tests the hypothesis that k population means are equal.

    The Kruskal Wallis test can be applied in the one factor ANOVA case. It is a non-parametric test for the situation where the ANOVA normality assumptions may not apply. Although this test is for identical populations, it is designed to be sensitive to unequal means.

    Let ni (i = 1, 2, ..., k) represent the sample sizes for each of the k groups (i.e., samples) in the data. Next, rank the combined sample. Then compute Ri = the sum of the ranks for group i. Then the Kruskal Wallis test statistic is:

      \( H = \frac{12} {n(n+1)} \sum_{i=1}^{k}{\frac{R_{i}^{2}} {n_i}} - 3(n+1) \)

    This statistic approximates a chi-square distribution with k-1 degrees of freedom if the null hypothesis of equal populations is true. Each of the ni should be at least 5 for the approximation to be valid.

    We reject the null hypothesis of equal population means if the test statistic H is greater than CHIPPF(ALPHA,K-1) where CHIPPF is the chi-square percent point function

    More formally,

    H0: All of the k population distribution functions are identical
    HA: At least one of the populations tends to yield larger observations than at least one of the other populations
    Test Statistic: \( H = \frac{12} {n(n+1)} \sum_{i=1}^{k}{\frac{R_{i}^{2}} {n_i}} - 3(n+1) \)
    Significance Level: \( \alpha \), typically set to 0.05.
    Critical Region: H > CHIPPF(\( \alpha \),k - 1) where CHIPPF is the chi-square percent point function.
    Conclusion: Reject the null hypothesis if the test statistic is in the critical region.

Syntax 1:
    KRUSKAL WALLIS <y> <x>             <SUBSET/EXCEPT/FOR qualification>
    where <y> is the response (= dependent) variable;
                <x> is the factor (= independent) variable;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.
Syntax 2:
    MULTIPLE KRUSKAL WALLIS <y1> ... <yk>
                            <SUBSET/EXCEPT/FOR qualification>
    where <y1> ... <yk> is a list of 1 to 30 response variables;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    This syntax is used for the case when the data for each group is stored in a separate variable. This syntax accepts matrix arguments.

Examples:
    KRUSKAL WALLIS Y X
    KRUSKAL WALLIS Y X SUBSET X = 1 TO 4
    MULTIPLE KRUSKAL WALLIS Y1 Y2 Y3 Y4
    MULTIPLE KRUSKAL WALLIS Y1 TO Y4
Note:
    Conover lists the following assumptions for the Kruskal Wallis test:

    1. All samples are random samples from their respective populations.

    2. In addition to independence within each sample, there is mutual independence among the various samples.

    3. The measurement scale is at least ordinal (i.e., the data can be ranked).

    4. Either the k population distribution functions are identical or else some of the populations tend to yield larger values than other populations do.
Note:
    If the hypothesis of identical distributions is rejected, you can perform a multiple comparisons procedure to determine which pairs of populations tend to differ.

    The populations i and j seem to be different if the following inequality is satisfied:

      \( \left| \frac{R_{i}}{N_{i}} - \frac{R_{j}}{N_{j}} \right| > \mbox{TPPF}(1 - \alpha/2) \sqrt{\frac{s^2(N-1-T)}{N-k}} \sqrt{\frac{1}{N_i} + \frac{1}{N_j}} \)

    with TPPF and T denoting the t percent point function with N - k degrees of freedom and the Kruskal-Wallis test statistic, respectively.

Note:
    The output was reformatted for the 2011/6 version. The SET WRITE DECIMALS command can now be used to specify the number of digits to include in the output.
Note:
    The following statistics are also supported:

      LET A = KRUSKAL WALLIS TEST Y X
      LET A = KRUSKAL WALLIS TEST CDF Y X
      LET A = KRUSKAL WALLIS TEST PVALUE Y X

    with Y denoting the response variable, X denoting the group-id variable, and ALPHA denoting the significance level for the critical value.

    In addition to the above LET command, built-in statistics are supported for about 20+ different commands (enter HELP STATISTICS for details).

Default:
    None
Synonyms:
    The following are synonyms for KRUSKAL WALLIS:

      KRUSKAL WALLIS TEST
      KRUSKAL TEST
Related Commands: Reference:
    Conover (1999), "Practical Nonparametric Statistics," Third Edition, Wiley, pp. 288-297.

    Walpole and Myers (1978), "Probability and Statistics for Engineers and Scientists," Second Edition, MacMillian.

Applications:
    Analysis of Variance
Implementation Date:
    1999/8
    2004/10: Modified test to use Conover formulation rather than the Walpole Meyers formulation
    2011/06: Reformatted the output, support for SET WRITE DECIMALS
    2011/06: Support for the MULTIPLE option
Program:
     
    SKIP 25
    READ SPLETT2.DAT Y MACHINE
    SET WRITE DECIMALS 5
    KRUSKAL WALLIS Y MACHINE
        
    The following output is generated.
                Kruskal-Wallis One Factor Test
     
    Response Variable: Y
    Group-ID Variable: MACHINE
     
    H0: Samples Come From Identical Populations
    Ha: Samples Do Not Come From Identical Populations
     
    Summary Statistics:
    Total Number of Observations:                                  99
    Number of Groups:                                               4
     
    Kruskal-Wallis Test Statistic Value:                     41.10239
    CDF of Test Statistic:                                    0.99999
    P-Value:                                                  0.00000
     
     
    Percent Points of the Chi-Square Reference Distribution
    -----------------------------------
      Percent Point               Value
    -----------------------------------
                0.0    =          0.000
               50.0    =          2.366
               75.0    =          4.107
               90.0    =          6.251
               95.0    =          7.815
               97.5    =          9.348
               99.0    =         11.345
               99.9    =         16.265
     
    Conclusions (Upper 1-Tailed Test)
    ----------------------------------------------
      Alpha    CDF   Critical Value     Conclusion
    ----------------------------------------------
        10%    90%            6.251      Reject H0
         5%    95%            7.815      Reject H0
       2.5%  97.5%            9.348      Reject H0
         1%    99%           11.345      Reject H0
     
     
                Multiple Comparisons Table
     
    ------------------------------------------------------------------------
        I    J  |Ri/Ni - Rj/Nj|         90% CV         95% CV         99% CV
    ------------------------------------------------------------------------
        1    2         18.82083       10.54643       12.60485       16.68947
        1    3         47.56083       10.54643       12.60485       16.68947
        1    4          4.98083       10.54643       12.60485       16.68947
        2    3         28.74000       10.43825       12.47556       16.51830
        2    4         13.83999       10.43825       12.47556       16.51830
        3    4         42.58000       10.43825       12.47556       16.51830
        
Date created: 06/05/2001
Last updated: 12/11/2023

Please email comments on this WWW page to alan.heckert@nist.gov.