Dataplot Vol 2 Vol 1

# GENERALIZED JACCARD COEFFICIENT GENERALIZED JACCARD DISTANCE

Name:
GENERALIZED JACCARD COEFFICIENT (LET)
GENERALIZED JACCARD DISTANCE (LET)
Type:
Let Subcommand
Purpose:
Compute the generalized Jaccard coefficient or the generalized Jaccard distance between two variables.
Description:
The generalized Jaccard coefficient between two variabes X and Y is

$$J(X,Y) = \frac{\sum_{i=1}^{n}{\min(X_{i},Y_{i})}} {\sum_{i=1}^{n}{\max(X_{i},Y_{i})}}$$

The Jaccard distance is then defined as 1 - J(X,Y).

Syntax 1:
LET <par> = GENERALIZED JACCARD COEFFICIENT <y1> <y2>
<SUBSET/EXCEPT/FOR qualification>
where <y1> is the first response variable;
<y2> is the second response variable;
<par> is a parameter where the computed generalized Jaccard coefficient is stored;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.
Syntax 2:
LET <par> = GENERALIZED JACCARD DISTANCE <y1> <y2>
<SUBSET/EXCEPT/FOR qualification>
where <y1> is the first response variable;
<y2> is the second response variable;
<par> is a parameter where the computed generalized Jaccard distance is stored;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.
Examples:
LET A = GENERALIZED JACCARD COEFFICIENT Y1 Y2
LET A = GENERALIZED JACCARD DISTANCE Y1 Y2
LET A = GENERALIZED JACCARD COEFFICIENT Y1 Y2 ...
SUBSET Y1 >= 0 SUBSET Y2 >= 0
Note:
Dataplot statistics can be used in a number of commands. For details, enter

Default:
None
Synonyms:
None
Related Commands:
 BINARY JACCARD DISSIMILARITY = Compute the Jaccard dissimilarity coeficient for two binary variables. COSINE DISTANCE = Compute the cosine distance. MANHATTAN DISTANCE = Compute the Euclidean distance. EUCLIDEAN DISTANCE = Compute the Euclidean distance. MATRIX DISTANCE = Compute various distance metrics for a matrix. GENERATE MATRIX = Compute a matrix of pairwise statistic values.
Applications:
Multivariate Analysis
Implementation Date:
2017/08
Program 1:

SKIP 25
READ IRIS.DAT Y1 TO Y4 X
.
LET DIST  = GENERALIZED JACCARD DISTANCE Y1 Y2
SET WRITE DECIMALS 4
TABULATE GENERALIZED JACCARD DISTANCE Y1 Y2 X

Cross Tabulate GENERALIZED JACCARD DISTANCE

(Response Variables: Y1       Y2      )
---------------------------------------------
X          |   GENERALIZED JAC
---------------------------------------------
1.0000   |            0.3152
2.0000   |            0.5334
3.0000   |            0.5486

.
XTIC OFFSET 0.2 0.2
X1LABEL GROUP ID
LET NDIST = UNIQUE X
XLIMITS 1 NDIST
MAJOR X1TIC MARK NUMBER NDIST
MINOR X1TIC MARK NUMBER 0
CHAR X
LINE BLANK
LABEL CASE ASIS
CASE ASIS
TITLE CASE ASIS
TITLE OFFSET 2
.
TITLE Generalized Jaccard Distance (IRIS.DAT)
Y1LABEL Generalized Jaccard Distance
GENERALIZED JACCARD DISTANCE PLOT Y1 Y2 X

Program 2:

set write decimals 3
dimension 100 columns
.
skip 25
read iris.dat y1 y2 y3 y4
skip 0
.
let z = generate matrix generalized jaccard coefficient y1 y2 y3 y4
print z

The following output is generated

MATRIX Z       --            4 ROWS
--            4 COLUMNS

VARIABLES--Z1             Z2             Z3             Z4

1.000          0.523          0.586          0.148
0.523          1.000          0.469          0.283
0.586          0.469          1.000          0.252
0.148          0.283          0.252          1.000


NIST is an agency of the U.S. Commerce Department.

Date created: 08/31/2017
Last updated: 08/31/2017