SED navigation bar go to SED home page go to Dataplot home page go to NIST home page SED Home Page SED Staff SED Projects SED Products and Publications Search SED Pages
Dataplot Vol 2 Vol 1

COSINE DISTANCE
COSINE SIMILARITY
ANGULAR COSINE DISTANCE
ANGULAR COSINE SIMILARITY

Name:
    COSINE DISTANCE (LET)
    COSINE SIMILARITY (LET)
    ANGULAR COSINE DISTANCE (LET)
    ANGULAR COSINE SIMILARITY (LET)
Type:
    Let Subcommand
Purpose:
    Compute the cosine distance (or cosine similarity, angular cosine distance, angular cosine similarity) between two variables.
Description:
    The cosine similarity is defined as

      \( \mbox{Cosine Similarity} = \frac{\sum_{i=1}^{n}{x_{i} y_{i}}} {\sqrt{\sum_{i=1}^{n}{x_{i}^{2}}} \sqrt{\sum_{i=1}^{n}{y_{i}^{2}}}} \)

    The cosine distance is then defined as

      \( \mbox{Cosine Distance} = 1 - \mbox{Cosine Similarity} \)

    The cosine distance above is defined for positive values only. It is also not a proper distance in that the Schwartz inequality does not hold. However, the following angular definitions are proper distances:

      \( \mbox{angular cosine distance} = \frac{1/\mbox{cosine similarity}} {\pi} \)

      \( \mbox{angular cosine similarty} = 1 - \mbox{angular cosine distance} \)

    If negative values are encountered in the input, the cosine distances will not be computed. However, the cosine similarities will be computed.

    NOTE: The 2018/08 version of Dataplot updated the definition for the angular cosine distance to

      \( \mbox{angular cosine distance} = \frac{\mbox{c} \arccos(\mbox{cosine similarity})} {\pi} \)

    with \( \arccos \) designating the arccosine function and where c = 2 if there are no negative values and c = 1 if there are negative values.

Syntax 1:
    LET <par> = COSINE DISTANCE <y1> <y2>
                            <SUBSET/EXCEPT/FOR qualification>
    where <y1> is the first response variable;
                <y2> is the second response variable;
                <par> is a parameter where the computed cosine distance is stored;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.
Syntax 2:
    LET <par> = COSINE SIMILARITY <y1> <y2>
                            <SUBSET/EXCEPT/FOR qualification>
    where <y1> is the first response variable;
                <y2> is the second response variable;
                <par> is a parameter where the computed cosine similarity is stored;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.
Syntax 3:
    LET <par> = ANGULAR COSINE DISTANCE <y1> <y2>
                            <SUBSET/EXCEPT/FOR qualification>
    where <y1> is the first response variable;
                <y2> is the second response variable;
                <par> is a parameter where the computed angular cosine distance is stored;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.
Syntax 4:
    LET <par> = ANGULAR COSINE SIMILARITY <y1> <y2>
                            <SUBSET/EXCEPT/FOR qualification>
    where <y1> is the first response variable;
                <y2> is the second response variable;
                <par> is a parameter where the computed angular cosine similarity is stored;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.
Examples:
    LET A = COSINE DISTANCE Y1 Y2
    LET A = COSINE SIMILARITY Y1 Y2
    LET A = ANGULAR COSINE DISTANCE Y1 Y2
    LET A = ANGULAR COSINE SIMILARITY Y1 Y2
    LET A = SHORTEST HALF MIDMEAN Y1 SUBSET TAG > 2

    LET A = COSINE DISTANCE Y1 Y2 SUBSET Y1 > 0 SUBSET Y2 > 0

Note:
    Dataplot statistics can be used in a number of commands. For details, enter

Default:
    None
Synonyms:
    None
Related Commands: Reference:
    John Foreman (2014), "Data Smart", Wiley.
Applications:
    Robust Clustering
Implementation Date:
    2017/06
    2018/08: Modified formula for angular cosine distance
Program:
     
    SKIP 25
    READ IRIS.DAT Y1 TO Y4 X
    .
    LET COSDIST  = COSINE DISTANCE Y1 Y2
    LET COSADIST = ANGULAR COSINE DISTANCE Y1 Y2
    LET COSSIMI  = COSINE SIMILARITY Y1 Y2
    LET COSASIMI = ANGULAR COSINE SIMILARITY Y1 Y2
    SET WRITE DECIMALS 4
    TABULATE COSINE DISTANCE Y1 Y2 X
    
    
                Cross Tabulate COSINE DISTANCE
     
    (Response Variables: Y1       Y2      )
    ---------------------------------------------
           X          |   COSINE DISTANCE
    ---------------------------------------------
             1.0000   |            0.0027
             2.0000   |            0.0049
             3.0000   |            0.0056
    
    . XTIC OFFSET 0.2 0.2 X1LABEL GROUP ID LET NDIST = UNIQUE X XLIMITS 1 NDIST MAJOR X1TIC MARK NUMBER NDIST MINOR X1TIC MARK NUMBER 0 CHAR X LINE BLANK LABEL CASE ASIS CASE ASIS TITLE CASE ASIS TITLE OFFSET 2 . MULTIPLOT CORNER COORDIANTES 5 5 95 95 MULTIPLOT SCALE FACTOR 2 MULTIPLOT 2 2 . Y1LABEL Cosine Distance TITLE Cosine Distance (Sepal Length and Sepal Width) COSINE DISTANCE PLOT Y1 Y2 X . Y1LABEL Cosine Similarity TITLE Cosine Similarity (Sepal Length and Sepal Width) COSINE SIMILARITY PLOT Y1 Y2 X . Y1LABEL Angular Cosine Distance TITLE Angular Cosine Distance (Sepal Length and Sepal Width) COSINE ANGULAR DISTANCE PLOT Y1 Y2 X . Y1LABEL Angular Cosine Similarity TITLE Angular Cosine Similarity (Sepal Length and Sepal Width) ANGULAR COSINE SIMILARITY PLOT Y1 Y2 X . END OF MULTIPLOT JUSTIFICATION CENTER MOVE 50 98 TEXT Distance/Similarity Measures (IRIS.DAT)

    plot generated by sample program

    . BOOTSTRAP SAMPLES 1000 CHAR X ALL LINE BLANK ALL BOOTSTRAP COSINE DISTANCES PLOT Y1 Y2 X

     
        plot generated by sample program
        

Privacy Policy/Security Notice
Disclaimer | FOIA

NIST is an agency of the U.S. Commerce Department.

Date created: 07/03/2017
Last updated: 07/03/2017

Please email comments on this WWW page to alan.heckert@nist.gov.