Dataplot Vol 2 Vol 1

# WEIGHTED CORRELATION WEIGHTED COVARIANCE WEIGHTED COSINE DISTANCE WEIGHTED COSINE SIMILARITY

Name:
WEIGHTED CORRELATION (LET)
WEIGHTED COVARIANCE (LET)
WEIGHTED COSINE DISTANCE (LET)
WEIGHTED COSINE SIMILARITY (LET)
Type:
Let Subcommand
Purpose:
Compute the weighted correlation coefficient between two variables.
Description:
Given paired response variables x and y of length n and a weights variable w, the weighted covariance is computed with the formula

$$cov(x,y;w) = \frac {\sum_{i=1}^{n}{w_{i} (x_{i} - m(x;w))(Y_{i} - m(y;w))}} {\sum_{i=1}^{n}{w_{i}}}$$

where $$m$$ denotes the weighted mean

$$m(x:w) = \frac{\sum_{i=1}^{n}{w_{i} x_{i}}} {\sum_{i=1}^{n}{w_{i}}}$$

The weighted correlation coefficient is computed with the formula

$$\begin{array}{lcl} r & = & \frac{S_{xy}} {\sqrt{S_{xx} S_{yy}}} \\ & = & \frac{cov(x,y;w)} {\sqrt{cov(x,x;w) cov(y,y;w)}} \end{array}$$

where

$$S_{xx} = \sum_{i=1}^{n}{w_{i} (x_{i} - M(x;w))^{2}}$$ $$S_{yy} = \sum_{i=1}^{n}{w_{i} (y_{i} - M(y;w))^{2}}$$ $$S_{xy} = \sum_{i=1}^{n}{w_{i} (x_{i} - M(x;w)) (y_{i} - M(y;w))}$$

The cosine similarity, which is equivalent to the reflective correlation coefficient, is defined as

$$\mbox{Cosine Similarity} = \frac{\sum_{i=1}^{n}{x_{i} y_{i}}} {\sqrt{\sum_{i=1}^{n}{x_{i}^{2}}} \sqrt{\sum_{i=1}^{n}{y_{i}^{2}}}}$$

The cosine distance is then defined as

$$\mbox{Cosine Distance} = 1 - \mbox{Cosine Similarity}$$

The weighted cosine similarity is defined as

$$\mbox{Cosine Similarity} = \frac{\sum_{i=1}^{n}{w_{i} x_{i} y_{i}}} {\sqrt{\sum_{i=1}^{n}{w_{i} x_{i}^{2}}} \sqrt{\sum_{i=1}^{n}{w_{i} y_{i}^{2}}}}$$

The weighted cosine distance is then defined as

$$\mbox{Weighted Cosine Distance} = 1 - \mbox{Weighted Cosine Similarity}$$

A weighted linear regression is sometimes used when the error variances are not homogeneous (e.g, variances are often higher in one or both tails). In these cases, you may also want to obtain a weighted correlation coefficient using the same weights as the linear fit.

The Alaska pipeline case study in the NIST/SEMATECH e-Handbook of Statistical Methods gives an example of how weights can be determined. Although this is done in the context of a regression analysis, the same approach applies to weighted correlation and weighted covariance. See

If you have grouped data (i.e., a bivariate frequency table), use the GROUPED CORRELATION command. Grouped correlation is similar to weighted correlation, but a different computational formula is used.

Syntax 1:
LET <par> = WEIGHTED CORRELATION <y1> <y2> <weights>
<SUBSET/EXCEPT/FOR qualification>
where <y1> is the first response variable;
<y2> is the second response variable;
<weights> is the weights variable;
<par> is a parameter where the computed weighted correlation is stored;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.
Syntax 2:
LET <par> = WEIGHTED COVARIANCE <y1> <y2> <weights>
<SUBSET/EXCEPT/FOR qualification>
where <y1> is the first response variable;
<y2> is the second response variable;
<weights> is the weights variable;
<par> is a parameter where the computed weighted covariance is stored;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.
Syntax 3:
LET <par> = WEIGHTED COSINE DISTANCE <y1> <y2> <weights>
<SUBSET/EXCEPT/FOR qualification>
where <y1> is the first response variable;
<y2> is the second response variable;
<weights> is the weights variable;
<par> is a parameter where the computed weighted cosine distance is stored;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.
Syntax 4:
LET <par> = WEIGHTED COSINE SIMILARITY <y1> <y2> <weights>
<SUBSET/EXCEPT/FOR qualification>
where <y1> is the first response variable;
<y2> is the second response variable;
<weights> is the weights variable;
<par> is a parameter where the computed weighted cosine similarity is stored;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.
Examples:
LET A = WEIGHTED CORRELATION Y1 Y2 WEIGHTS
LET A = WEIGHTED COVARIANCE Y1 Y2 WEIGHTS
LET A = WEIGHTED COSINE DISTANCE Y1 Y2 WEIGHTS
LET A = WEIGHTED COSINE SIMILARITY Y1 Y2 WEIGHTS
Note:
Dataplot statistics can be used in a number of commands. For details, enter

Default:
None
Synonyms:
None
Related Commands:
 GROUPED CORRELATION = Compute the correlation coefficient based on bivariate frequency data. WEIGHED COSINE DISTANCE = Compute the weighted cosine distance. CORRELATION = Compute the correlation of two variables. COVARIANCE = Compute the covariance of two variables.
Reference:
Applications:
Linear Regression
Implementation Date:
2018/10
Program:

. Step 1:   Read the data
.
skip 25
.
. Step 2:   Compute both the unweighted and weighted correlations
.
.           Weights from e-Handbook case study of Alaska pipeline data
.
let wt = 1/(x**(1.5))
let corr = correlation y x
let wtcorr = weighted correlation y x wt
let cov = covariance y x
let wtcov = weighted covariance y x wt
.
set write decimals 3
print "Unweighted correlation:  ^corr"
print "Weighted correlation:    ^wtcorr"
print "Unweighted covariance:   ^cov"
print "Weighted covariance:     ^wtcov"

The following output is returned

Unweighted correlation:  0.945581893216
Weighted correlation:    0.983560277814
Unweighted covariance:   423.101490037
Weighted covariance:     500.3662749985


NIST is an agency of the U.S. Commerce Department.

Date created: 11/08/2018
Last updated: 08/10/2020