SED navigation bar go to SED home page go to Dataplot home page go to NIST home page SED Home Page SED Staff SED Projects SED Products and Publications Search SED Pages
Dataplot Vol 2 Vol 1

KENDALL TAU DISSIMILARITY

Name:
    KENDALL TAU DISSIMILARITY (LET)<
    KENDALL TAU SIMILARITY (LET)<
Type:
    Let Subcommand
Purpose:
    Compute the Kendall's tau correlation coefficient transformed to a dissimilarity measure between two variables.
Description:
    Kendall's tau coefficient is a measure of concordance between two paired variables. Given the pairs (Xi,Yi) and (Xj,Yj), then

      \( \frac{Y_j - Y_i}{X_j - X_i} \) > 0 - pair is concordant

      \( \frac{Y_j - Y_i}{X_j - X_i} \) < 0 - pair is discordant

      \( \frac{Y_j - Y_i}{X_j - X_i} \) = 0 - pair is considered a tie

      Xi = Xj - pair is not compared

    Kendall's tau is computed as

      \( \tau = \frac{N_c - N_d}{N_c + N_d} \)

    with Nc and Nd denoting the number of concordant pairs and the number of discordant pairs, respectively, in the sample. Ties add 0.5 to both the concordant and discordant counts. There are \( \left( \begin{array}{c} n \\ 2 \end{array} \right) \) possible pairs in the bivariate sample.

    Kendall's tau is an alternative to the Spearman's rho rank correlation.

    A perfect linear relationship yields a correlation coefficient of +1 (or -1 for a negative relationship) and no linear relationship yields a correlation coefficient of 0.

    In some applications, such as clustering, it can be useful to transform Kendall's tau coefficient to a dissimilarity measure. The transformation used here is

      \( d = \frac{1 - R} {2} \)

    This converts Kendall's tau coefficient with values between -1 and 1 to a score between 0 and 1. High positive correlation (i.e., very similar) results in a dissimilarity near 0 and high negative correlation (i.e., very dissimilar) results in a dissimilarity near 1.

    If a similarity score is preferred, you can use

      \( s = 1 - d \)

    where d is defined as above.

Syntax 1:
    LET <par> = KENDALL TAU DISSIMILARITY <y1> <y2>
                            <SUBSET/EXCEPT/FOR qualification>
    where <y1> is the first response variable;
                <y2> is the second response variable;
                <par> is a parameter where the computed Kendall's tau dissimilarity is stored;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.
Syntax 2:
    LET <par> = KENDALL TAU SIMILARITY <y1> <y2>
                            <SUBSET/EXCEPT/FOR qualification>
    where <y1> is the first response variable;
                <y2> is the second response variable;
                <par> is a parameter where the computed Kendall's tau similarity is stored;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.
Examples:
    LET A = KENDALL TAU DISSIMILARITY Y1 Y2
    LET A = KENDALL TAU DISSIMILARITY Y1 Y2 SUBSET TAG > 2
    LET A = KENDALL TAU SIMILARITY Y1 Y2
Note:
    The two variables must have the same number of elements.
Default:
    None
Synonyms:
    KENDALL DISTANCE is a synonym for KENDALLS DISSIMILARITY
Related Commands: Reference:
    Kaufman and Rousseeuw (1990), "Finding Groups in Data: An Introduction To Cluster Analysis", Wiley.
Applications:
    Clustering
Implementation Date:
    2017/08:
    2018/10: Added KENDALL TAU SIMILARITY
    2018/10: Added KENDALL TAU DISTANCE as a synonym for KENDALL TAU DISSIMILARITY
Program 1:
     
    SKIP 25
    READ BERGER1.DAT Y X
    LET CORR = KENDALL TAU Y X
    LET D    = KENDALL TAU DISSIMILARITY Y X
    SET WRITE DECIMALS 3
    PRINT CORR D
        
    The following output is generated
        
Program 2:
     
    SKIP 25
    READ IRIS.DAT Y1 Y2 Y3 Y4
    SET WRITE DECIMALS 3
    .
    LET M = GENERATE MATRIX KENDALL TAU DISSIMILARITY Y1 Y2 Y3 Y4
    PRINT M
        
    The following output is generated
        
Program 3:
     
    SKIP 25
    READ IRIS.DAT Y1 Y2 Y3 Y4 TAG
    .
    TITLE CASE ASIS
    TITLE OFFSET 2
    CASE ASIS
    TIC MARK OFFSET UNITS DATA
    YLIMITS 0 1
    MAJOR YTIC MARK NUMBER 6
    MINOR YTIC MARK NUMBER 1
    Y1TIC MARK LABEL DECIMAL 1
    XLIMITS 1 3
    MAJOR XTIC MARK NUMBER 3
    MINOR XTIC MARK NUMBER 0
    XTIC MARK OFFSET 0.3 0.3
    CHARACTER X BLANK
    LINES BLANK SOLID
    .
    MULTIPLOT CORNER COORDINATES 5 5 95 95
    MULTIPLOT SCALE FACTOR 2
    MULTIPLOT 2 3
    .
    TITLE Sepal Length vs Sepal Width
    KENDALL TAU DISSIMILARITY Y1 Y2 TAG
    .
    TITLE Sepal Length vs Petal Length
    KENDALL TAU DISSIMILARITY Y1 Y3 TAG
    .
    TITLE Sepal Length vs Petal Width
    KENDALL TAU DISSIMILARITY Y1 Y4 TAG
    .
    TITLE Sepal Width vs Petal Length
    KENDALL TAU DISSIMILARITY Y2 Y3 TAG
    .
    TITLE Sepal Width vs Petal Width
    KENDALL TAU DISSIMILARITY Y2 Y4 TAG
    .
    TITLE Petal Length vs Petal Width
    KENDALL TAU DISSIMILARITY Y3 Y4 TAG
    .
    END OF MULTIPLOT
    X1LABEL Species
    .
    JUSTIFICATION CENTER
    MOVE 50 5
    TEXT Species
    DIRECTION VERTICAL
    MOVE 5 50
    TEXT Kendall Tau Dissimilarity Coefficient
    DIRECTION HORIZONTAL
        
    plot generated by sample program

Privacy Policy/Security Notice
Disclaimer | FOIA

NIST is an agency of the U.S. Commerce Department.

Date created: 09/20/2017
Last updated: 09/20/2017

Please email comments on this WWW page to alan.heckert@nist.gov.