Dataplot Vol 2 Vol 1

# COSINE DISTANCE COSINE SIMILARITY ANGULAR COSINE DISTANCE ANGULAR COSINE SIMILARITY

Name:
COSINE DISTANCE (LET)
COSINE SIMILARITY (LET)
ANGULAR COSINE DISTANCE (LET)
ANGULAR COSINE SIMILARITY (LET)
Type:
Let Subcommand
Purpose:
Compute the cosine distance (or cosine similarity, angular cosine distance, angular cosine similarity) between two variables.
Description:
The cosine similarity is defined as

$$\mbox{Cosine Similarity} = \frac{\sum_{i=1}^{n}{x_{i} y_{i}}} {\sqrt{\sum_{i=1}^{n}{x_{i}^{2}}} \sqrt{\sum_{i=1}^{n}{y_{i}^{2}}}}$$

The cosine distance is then defined as

$$\mbox{Cosine Distance} = 1 - \mbox{Cosine Similarity}$$

The cosine distance above is defined for positive values only. It is also not a proper distance in that the Schwartz inequality does not hold. However, the following angular definitions are proper distances:

$$\mbox{angular cosine distance} = \frac{1/\mbox{cosine similarity}} {\pi}$$

$$\mbox{angular cosine similarty} = 1 - \mbox{angular cosine distance}$$

If negative values are encountered in the input, the cosine distances will not be computed. However, the cosine similarities will be computed.

Syntax 1:
LET <par> = COSINE DISTANCE <y1> <y2>
<SUBSET/EXCEPT/FOR qualification>
where <y1> is the first response variable;
<y2> is the second response variable;
<par> is a parameter where the computed cosine distance is stored;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.
Syntax 2:
LET <par> = COSINE SIMILARITY <y1> <y2>
<SUBSET/EXCEPT/FOR qualification>
where <y1> is the first response variable;
<y2> is the second response variable;
<par> is a parameter where the computed cosine similarity is stored;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.
Syntax 3:
LET <par> = ANGULAR COSINE DISTANCE <y1> <y2>
<SUBSET/EXCEPT/FOR qualification>
where <y1> is the first response variable;
<y2> is the second response variable;
<par> is a parameter where the computed angular cosine distance is stored;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.
Syntax 4:
LET <par> = ANGULAR COSINE SIMILARITY <y1> <y2>
<SUBSET/EXCEPT/FOR qualification>
where <y1> is the first response variable;
<y2> is the second response variable;
<par> is a parameter where the computed angular cosine similarity is stored;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.
Examples:
LET A = COSINE DISTANCE Y1 Y2
LET A = COSINE SIMILARITY Y1 Y2
LET A = ANGULAR COSINE DISTANCE Y1 Y2
LET A = ANGULAR COSINE SIMILARITY Y1 Y2
LET A = SHORTEST HALF MIDMEAN Y1 SUBSET TAG > 2

LET A = COSINE DISTANCE Y1 Y2 SUBSET Y1 > 0 SUBSET Y2 > 0

Note:
Dataplot statistics can be used in a number of commands. For details, enter

Default:
None
Synonyms:
None
Related Commands:
 EUCLIDEAN DISTANCE = Compute the Euclidean distance. MANHATTAN DISTANCE = Compute the Manhattan distance. MATRIX DISTANCE = Compute various distance metrics for a matrix. CORRELATION = Compute the correlation between two variables.
Reference:
John Foreman (2014), "Data Smart", Wiley.
Applications:
Robust Clustering
Implementation Date:
2017/06
Program:

SKIP 25
READ IRIS.DAT Y1 TO Y4 X
.
LET COSDIST  = COSINE DISTANCE Y1 Y2
LET COSADIST = ANGULAR COSINE DISTANCE Y1 Y2
LET COSSIMI  = COSINE SIMILARITY Y1 Y2
LET COSASIMI = ANGULAR COSINE SIMILARITY Y1 Y2
SET WRITE DECIMALS 4
TABULATE COSINE DISTANCE Y1 Y2 X

Cross Tabulate COSINE DISTANCE

(Response Variables: Y1       Y2      )
---------------------------------------------
X          |   COSINE DISTANCE
---------------------------------------------
1.0000   |            0.0027
2.0000   |            0.0049
3.0000   |            0.0056

. XTIC OFFSET 0.2 0.2 X1LABEL GROUP ID LET NDIST = UNIQUE X XLIMITS 1 NDIST MAJOR X1TIC MARK NUMBER NDIST MINOR X1TIC MARK NUMBER 0 CHAR X LINE BLANK LABEL CASE ASIS CASE ASIS TITLE CASE ASIS TITLE OFFSET 2 . MULTIPLOT CORNER COORDIANTES 5 5 95 95 MULTIPLOT SCALE FACTOR 2 MULTIPLOT 2 2 . Y1LABEL Cosine Distance TITLE Cosine Distance (Sepal Length and Sepal Width) COSINE DISTANCE PLOT Y1 Y2 X . Y1LABEL Cosine Similarity TITLE Cosine Similarity (Sepal Length and Sepal Width) COSINE SIMILARITY PLOT Y1 Y2 X . Y1LABEL Angular Cosine Distance TITLE Angular Cosine Distance (Sepal Length and Sepal Width) COSINE ANGULAR DISTANCE PLOT Y1 Y2 X . Y1LABEL Angular Cosine Similarity TITLE Angular Cosine Similarity (Sepal Length and Sepal Width) ANGULAR COSINE SIMILARITY PLOT Y1 Y2 X . END OF MULTIPLOT JUSTIFICATION CENTER MOVE 50 98 TEXT Distance/Similarity Measures (IRIS.DAT)

. BOOTSTRAP SAMPLES 1000 CHAR X ALL LINE BLANK ALL BOOTSTRAP COSINE DISTANCES PLOT Y1 Y2 X

NIST is an agency of the U.S. Commerce Department.

Date created: 07/03/2017
Last updated: 07/03/2017