Dataplot Vol 2 Vol 1

# KENDALL TAU DISSIMILARITY

Name:
KENDALL TAU DISSIMILARITY (LET)
Type:
Let Subcommand
Purpose:
Compute the Kendall's tau correlation coefficient transformed to a dissimilarity measure between two variables.
Description:
Kendall's tau coefficient is a measure of concordance between two paired variables. Given the pairs (Xi,Yi) and (Xj,Yj), then

$$\frac{Y_j - Y_i}{X_j - X_i}$$ > 0 - pair is concordant

$$\frac{Y_j - Y_i}{X_j - X_i}$$ < 0 - pair is discordant

$$\frac{Y_j - Y_i}{X_j - X_i}$$ = 0 - pair is considered a tie

Xi = Xj - pair is not compared

Kendall's tau is computed as

$$\tau = \frac{N_c - N_d}{N_c + N_d}$$

with Nc and Nd denoting the number of concordant pairs and the number of discordant pairs, respectively, in the sample. Ties add 0.5 to both the concordant and discordant counts. There are $$\left( \begin{array}{c} n \\ 2 \end{array} \right)$$ possible pairs in the bivariate sample.

Kendall's tau is an alternative to the Spearman's rho rank correlation.

A perfect linear relationship yields a correlation coefficient of +1 (or -1 for a negative relationship) and no linear relationship yields a correlation coefficient of 0.

In some applications, such as clustering, it can be useful to transform Kendall's tau coefficient to a dissimilarity measure. The transformation used here is

$$d = \frac{1 - R} {2}$$

This converts Kendall's tau coefficient with values between -1 and 1 to a score between 0 and 1. High positive correlation (i.e., very similar) results in a dissimilarity near 0 and high negative correlation (i.e., very dissimilar) results in a dissimilarity near 1.

If a similarity score is preferred, you can use

$$s = 1 - d$$

where d is defined as above.

Syntax:
LET <par> = KENDALL TAU DISSIMILARITY <y1> <y2>
<SUBSET/EXCEPT/FOR qualification>
where <y1> is the first response variable;
<y2> is the second response variable;
<par> is a parameter where the computed Kendall's tau dissimilarity is stored;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.
Examples:
LET A = KENDALL TAU DISSIMILARITY Y1 Y2
LET A = KENDALL TAU DISSIMILARITY Y1 Y2 SUBSET TAG > 2
Note:
The two variables must have the same number of elements.
Default:
None
Synonyms:
None
Related Commands:
 KENDALLS TAU = Compute the Kendall's tau correlation of two variables. CORRELATION = Compute the Pearson correlation of two variables. PEARSON DISSIMILARITY = Compute the dissimilarity of two variables based on Pearson correlation. SPEARMAN DISSIMILARITY = Compute the dissimilarity of two variables based on Spearman's rank correlation. COSINE DISTANCE = Compute the cosine distance. MANHATTAN DISTANCE = Compute the Euclidean distance. EUCLIDEAN DISTANCE = Compute the Euclidean distance. MATRIX DISTANCE = Compute various distance metrics for a matrix. GENERATE MATRIX = Compute a matrix of pairwise statistic values.
Reference:
Kaufman and Rousseeuw (1990), "Finding Groups in Data: An Introduction To Cluster Analysis", Wiley.
Applications:
Clustering
Implementation Date:
2017/08:
Program 1:

SKIP 25
LET CORR = KENDALL TAU Y X
LET D    = KENDALL TAU DISSIMILARITY Y X
SET WRITE DECIMALS 3
PRINT CORR D

The following output is generated

Program 2:

SKIP 25
READ IRIS.DAT Y1 Y2 Y3 Y4
SET WRITE DECIMALS 3
.
LET M = GENERATE MATRIX KENDALL TAU DISSIMILARITY Y1 Y2 Y3 Y4
PRINT M

The following output is generated

Program 3:

SKIP 25
READ IRIS.DAT Y1 Y2 Y3 Y4 TAG
.
TITLE CASE ASIS
TITLE OFFSET 2
CASE ASIS
TIC MARK OFFSET UNITS DATA
YLIMITS 0 1
MAJOR YTIC MARK NUMBER 6
MINOR YTIC MARK NUMBER 1
Y1TIC MARK LABEL DECIMAL 1
XLIMITS 1 3
MAJOR XTIC MARK NUMBER 3
MINOR XTIC MARK NUMBER 0
XTIC MARK OFFSET 0.3 0.3
CHARACTER X BLANK
LINES BLANK SOLID
.
MULTIPLOT CORNER COORDINATES 5 5 95 95
MULTIPLOT SCALE FACTOR 2
MULTIPLOT 2 3
.
TITLE Sepal Length vs Sepal Width
KENDALL TAU DISSIMILARITY Y1 Y2 TAG
.
TITLE Sepal Length vs Petal Length
KENDALL TAU DISSIMILARITY Y1 Y3 TAG
.
TITLE Sepal Length vs Petal Width
KENDALL TAU DISSIMILARITY Y1 Y4 TAG
.
TITLE Sepal Width vs Petal Length
KENDALL TAU DISSIMILARITY Y2 Y3 TAG
.
TITLE Sepal Width vs Petal Width
KENDALL TAU DISSIMILARITY Y2 Y4 TAG
.
TITLE Petal Length vs Petal Width
KENDALL TAU DISSIMILARITY Y3 Y4 TAG
.
END OF MULTIPLOT
X1LABEL Species
.
JUSTIFICATION CENTER
MOVE 50 5
TEXT Species
DIRECTION VERTICAL
MOVE 5 50
TEXT Kendall Tau Dissimilarity Coefficient
DIRECTION HORIZONTAL


NIST is an agency of the U.S. Commerce Department.

Date created: 09/20/2017
Last updated: 09/20/2017