SED navigation bar go to SED home page go to Dataplot home page go to NIST home page SED Home Page SED Staff SED Projects SED Products and Publications Search SED Pages
Dataplot Vol 1 Vol 2

KOLMOGOROV SMIRNOV TWO SAMPLE

Name:
    ... KOLMOGOROV SMIRNOV TWO SAMPLE TEST
Type:
    Analysis Command
Purpose:
    Perform a Kolmogorov-Smirnov two sample test that two data samples come from the same distribution. Note that we are not specifying what that common distribution is.
Description:
    The one sample Kolmogorov-Smirnov (K-S) test is based on the empirical distribution function (ECDF). Given N data points Y1, Y2, ..., YN the ECDF is defined as

      \( E_{N} = \frac{n_{i}}{N} \)

    where ni is the number of points less than Yi. This is a step function that increases by 1/N at the value of each data point. We can graph a plot of the empirical distribution function with a cumulative distribution function for a given distribution. The one sample K-S test is based on the maximum distance between these two curves. That is,

      \( D = \max_{1 \le i \le N}|F(Y_{i}) - \frac{i} {N}| \)

    where F is the theoretical cumulative distribution function.

    The two sample K-S test is a variation of this. However, instead of comparing an empirical distribution function to a theoretical distribution function, we compare the two empirical distribution functions. That is,

      \( D = |E_1(i) - E_2(i)| \)

    where E1 and E2 are the empirical distribution functions for the two samples. Note that we compute E1 and E2 at each point in both samples (that is both E1 and E2 are computed at each point in each sample).

    More formally, the Kolmogorov-Smirnov two sample test statistic can be defined as follows.

    H0: The two samples come from a common distribution.
    Ha: The two samples do not come from a common distribution.
    Test Statistic: The Kolmogorov-Smirnov two sample test statistic is defined as

      \( D = |E_1(i) - E_2(i)| \)

    where E1 and E2 are the empirical distribution functions for the two samples.

    Significance Level: \( \alpha \)
    Critical Region: The hypothesis regarding the distributional form is rejected if the test statistic, D, is greater than the critical value obtained from a table. There are several variations of these tables in the literature that use somewhat different scalings for the K-S test statistic and critical regions. These alternative formulations should be equivalent, but it is necessary to ensure that the test statistic is calculated in a way that is consistent with how the critical values were tabulated.

    Dataplot uses the critical values from Chakravart, Laha, and Roy (see Reference: below).

    The quantile-quantile plot, bihistogram, and Tukey mean-difference plot are graphical alternatives to the two sample K-S test.

Syntax 1:
    KOLMOGOROV SMIRNOV TWO SAMPLE TEST <y1> <y2>
                            <SUBSET/EXCEPT/FOR/qualification>
    where <y1> is the first response variable;
                <y2> is the second response variable;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.
Syntax 2:
    KOLMOGOROV SMIRNOV TWO SAMPLE TEST <y1> ... <yk>
                            <SUBSET/EXCEPT/FOR/qualification>
    where <y1> ... <yk> is a list of 2 to 30 response variables;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    This syntax performs all the pairwise two sample Kolmogorov Smirnov tests.

Examples:
    KOLMOGOROV-SMIRNOV TWO SAMPLE TEST Y1 Y2
    KOLMOGOROV-SMIRNOV TWO SAMPLE TEST Y1 Y2 SUBSET Y2 > 0
Note:
    The KOLMOGOROV-SMIRNOV TWO SAMPLE TEST command automatically saves the following parameters.

      STATVAL - value of the K-S two sample statistic
      CUTUPP90 - 90% critical value (alpha = 0.10) for the K-S two sample test statistic
      CUTUPP95 - 95% critical value (alpha = 0.05) for the K-S two sample test statistic
      CUTUPP99 - 99% critical value (alpha = 0.01) for the K-S two sample test statistic

    These parameters can be used in subsequent analysis.

Note:
    The KOLMOGOROV SMIRNOV TWO SAMPLE TEST was updated to use the following command

      SET TWO SAMPLE TEST NUMBER OF PERCENTILES <value>

    By default, the Kolmogorov-Smirnov test is generated using all the points. When the number of points gets large, this can result in this command taking a very long time. Computing this test for a specified number of percentiles of the data allows this command to be executed quickly without sacrificing too much information.

Default:
    None
Synonyms:
    KS is a synonym for KOLMOGOROV SMIRNOV.
    The word test in the command is optional.
    TWO can be entered as 2.

    Some examples,

      KOLMOGOROV SMIRNOV 2 SAMPLE Y1 Y2
      KS 2 SAMPLE Y1 Y2
      KS TWO SAMPLE TEST Y1 Y2
Related Commands: Reference:
    Chakravart, Laha, and Roy (1967), "Handbook of Methods of Applied Statistics, Volume I," John Wiley, pp. 392-394.

    Press, Teukolsky, Vetterling, and Flannery (1992), "Numerical Recipes in Fortan: The Art of Scientific Computing," Second Edition, Cambridge University Press, pp. 614-622.

Applications:
    Distributional Analysis
Implementation Date:
    1998/12
    2011/03: If more than two variables given, perform all pairwise tests
    2016/06: Added support for SET TWO SAMPLE TEST NUMBER OF PERCENTILES
    2016/06: Added KS as synonym for KOLMOGOROV SMIRNOV
Program 1:
     
    SKIP 25
    READ AUTO83B.DAT Y1 Y2
    .
    DELETE Y2 SUBSET Y2 < 0
    SET WRITE DECIMALS 4
    KOLMOGOROV-SMIRNOPV TWO SAMPLE TEST Y1 Y2
        
    The following output is generated.
                 Kolmogorov-Smirnov Two Sample Test
      
     First Response Variable:  Y1
     Second Response Variable: Y2
      
     H0: The Two Samples Come From the
         Same (Unspecified) Distribution
     Ha: The Two Samples Come From
         Different Distributions
      
     Sample One Summary Statistics:
     Number of Observations:                  249
     Sample Mean:                             20.1446
     Sample Standard Deviation:               6.4147
     Sample Minimum:                          9.0000
     Sample Maximum:                          39.0000
      
     Sample Two Summary Statistics:
     Number of Observations:                  79
     Sample Mean:                             30.4810
     Sample Standard Deviation:               6.1077
     Sample Minimum:                          18.0000
     Sample Maximum:                          47.0000
      
     Test Statistic Value:                    0.6003
      
      
                 Conclusions (Upper 1-Tailed Test)
      
     ------------------------------------------------------------------------
                                                                         Null
             Null   Significance           Test       Critical     Hypothesis
       Hypothesis          Level      Statistic    Region (>=)     Conclusion
     ------------------------------------------------------------------------
            Same           90.0%         0.6003         0.1575         REJECT
            Same           95.0%         0.6003         0.1756         REJECT
            Same           99.0%         0.6003         0.2105         REJECT
        
Program 2:
     
    let y1 = norm rand numb for i = 1 1 50
    let y2 = norm rand numb for i = 1 1 62
    let y3 = norm rand numb for i = 1 1 45
    .
    let y2 = 1.7*y2
    let y3 = 0.7*y3
    .
    set write decimals 5
    .
    two sample kolmogorov smirnov test  y1 y2 y3
        
    The following output is generated.
                 Kolmogorov-Smirnov Two Sample Test
      
     First Response Variable:  Y1
     Second Response Variable: Y2
      
     H0: The Two Samples Come From the
         Same (Unspecified) Distribution
     Ha: The Two Samples Come From
         Different Distributions
      
     Sample One Summary Statistics:
     Number of Observations:                  50
     Sample Mean:                             -0.00822
     Sample Standard Deviation:               0.71196
     Sample Minimum:                          -2.01524
     Sample Maximum:                          1.58788
      
     Sample Two Summary Statistics:
     Number of Observations:                  62
     Sample Mean:                             -0.29060
     Sample Standard Deviation:               1.94815
     Sample Minimum:                          -5.87855
     Sample Maximum:                          3.41010
      
     Test Statistic Value:                    0.28645
      
      
                 Conclusions (Upper 1-Tailed Test)
      
     ------------------------------------------------------------------------
                                                                         Null
             Null   Significance           Test       Critical     Hypothesis
       Hypothesis          Level      Statistic    Region (>=)     Conclusion
     ------------------------------------------------------------------------
            Same           90.0%        0.28645        0.23189         REJECT
            Same           95.0%        0.28645        0.25850         REJECT
            Same           99.0%        0.28645        0.30982         ACCEPT
      
      
                 Kolmogorov-Smirnov Two Sample Test
      
     First Response Variable:  Y1
     Second Response Variable: Y3
      
     H0: The Two Samples Come From the
         Same (Unspecified) Distribution
     Ha: The Two Samples Come From
         Different Distributions
      
     Sample One Summary Statistics:
     Number of Observations:                  50
     Sample Mean:                             -0.00822
     Sample Standard Deviation:               0.71196
     Sample Minimum:                          -2.01524
     Sample Maximum:                          1.58788
      
     Sample Two Summary Statistics:
     Number of Observations:                  45
     Sample Mean:                             -0.11118
     Sample Standard Deviation:               0.70195
     Sample Minimum:                          -2.21551
     Sample Maximum:                          1.29633
      
     Test Statistic Value:                    0.12222
      
      
                 Conclusions (Upper 1-Tailed Test)
      
     ------------------------------------------------------------------------
                                                                         Null
             Null   Significance           Test       Critical     Hypothesis
       Hypothesis          Level      Statistic    Region (>=)     Conclusion
     ------------------------------------------------------------------------
            Same           90.0%        0.12222        0.25069         ACCEPT
            Same           95.0%        0.12222        0.27945         ACCEPT
            Same           99.0%        0.12222        0.33493         ACCEPT
      
      
                 Kolmogorov-Smirnov Two Sample Test
      
     First Response Variable:  Y2
     Second Response Variable: Y3
      
     H0: The Two Samples Come From the
         Same (Unspecified) Distribution
     Ha: The Two Samples Come From
         Different Distributions
      
     Sample One Summary Statistics:
     Number of Observations:                  62
     Sample Mean:                             -0.29060
     Sample Standard Deviation:               1.94815
     Sample Minimum:                          -5.87855
     Sample Maximum:                          3.41010
      
     Sample Two Summary Statistics:
     Number of Observations:                  45
     Sample Mean:                             -0.11118
     Sample Standard Deviation:               0.70195
     Sample Minimum:                          -2.21551
     Sample Maximum:                          1.29633
      
     Test Statistic Value:                    0.24373
      
      
                 Conclusions (Upper 1-Tailed Test)
      
     ------------------------------------------------------------------------
                                                                         Null
             Null   Significance           Test       Critical     Hypothesis
       Hypothesis          Level      Statistic    Region (>=)     Conclusion
     ------------------------------------------------------------------------
            Same           90.0%        0.24373        0.23892         REJECT
            Same           95.0%        0.24373        0.26634         ACCEPT
            Same           99.0%        0.24373        0.31921         ACCEPT
      
        
    .
    let stat  = two sample kolm smir test y1 y2
    let cv95  = two sample kolm smir test critical value y1 y2
    let alpha = 0.9
    let cv90  = two sample kolm smir test critical value y1 y2
    let alpha = 0.99
    let cv99  = two sample kolm smir test critical value y1 y2
        
    The following output is generated.
     PARAMETERS AND CONSTANTS--
    
        STAT    --        0.28645
        CV95    --        0.25850
        CV90    --        0.23189
        CV99    --        0.30982
        
Date created: 06/05/2001
Last updated: 12/11/2023

Please email comments on this WWW page to alan.heckert@nist.gov.