MATRIX DISTANCE
Name:
Type:
Purpose:
Compute the distance matrix of a matrix.
Description:
Dataplot can compute the distances relative to either rows
or columns.
Given an nxp data matrix X, we compute a distance matrix D.
For row distances, the D(ij) element of the distance matrix
is the distance between row i and row j, which results in a
nxn D matrix. For column distances, the D(ij) element of the
distance matrix is the distance between column i and column j,
which results in a pxp D matrix.
Five distance metrics are available.
- The Euclidean row distance is defined as
D(ij) = SQRT(SUM(X(ik) - X(jk))**2)
where the summation is relative to k over columns 1 to p.
The Euclidean column distance is defined as
D(ij) = SQRT(SUM(X(ki) - X(kj))**2)
where the summation is relative to k over rows 1 to n.
The Euclidean distance is simply the square root of the
squared differences between corresponding elements of
the rows (or columns). This is probably the most
commonly used distance metric.
- The Mahalanobis distance is defined as
where
is the inverse of the variance-covariance matrix of X.
The row distances are obtained by letting X(i) and X(j)
represent the ith and jth row while the column distances
are obtained by letting X(i) and X(j) represent the ith
and jth columns.
The Mahalanobis distance is is effectively a weighted
Euclidean distance where the weighting is determined
by the sample variance-covariance matrix.
- The Minkowsky row distance is defined as
Dij=SUM(ABS|(X(ik) - X(jk))|**P)**(1/P)
The sum is from k = 1 to the number of columns. The
column distance is similar, but the summation is over
the number of rows rather than the number of columns.
The Minkowsky distance is the pth root of the sum of
the absolute differences to the pth power between
corresponding elements of the rows (or columns).
The Euclidean distance is the special case of P=2.
- The block row distance is defined as
Dij=SUM(ABS|(X(ik) - X(jk))|)
The sum is from k = 1 to the number of columns. The
column distance is similar, but the summation is over
the number of rows rather than the number of columns.
The block distance is the sum of the absolute differences
between corresponding elements of the rows (or columns).
Note that this is a special case of the Minkowsky
distance with p=1.
The block distance is also known as the city block or
Manhattan distance.
- The Chebychev row distance is defined as
Dij=MAX(ABS|(X(ik) - X(jk))|)
The sum is from k = 1 to the number of columns. The
column distance is similar, but the summation is over
the number of rows rather than the number of columns.
Many multivariate techniques are based on distance matrices.
Syntax 1:
Syntax 2:
Examples:
LET D = EUCLIDEAN ROW DISTANCE M
LET D = EUCLIDEAN COLUMN DISTANCE M
LET D = BLOCK ROW DISTANCE M
LET D = BLOCK COLUMN DISTANCE M
LET D = MAHALANOBIS ROW DISTANCE M
LET D = MAHALANOBIS COLUMN DISTANCE M
LET P = 1.5
LET D = MINKOWSKY ROW DISTANCE M
LET D = MINKOWSKY COLUMN DISTANCE M
Note:
Matrices are created with either the READ MATRIX command or the
MATRIX DEFINITION command. Enter HELP MATRIX DEFINITION and HELP
READ MATRIX for details.
Note:
For the Minkowsky distance, you need to specify the value of
P. This is done by entering the following command before
entering the MINKOWSKY DISTANCE command:
Note:
It is often desirable to scale the matrix before computing
the distances. Dataplot provides several scaling options.
Enter HELP MATRIX SCALE for details.
Note:
The correlation matrix and covariance matrix can be
considered distance matrices as well.
Default:
Synonyms:
Related Commands:
READ MATRIX
|
= Read a matrix.
|
MATRIX COLUMN DIMENSION
|
= Dimension maximum number of
columns for Dataplot matrices.
|
CORRELATION MATRIX
|
= Compute the correlation matrix.
|
VARIANCE-COVARIANCE MATRIX
|
= Compute the correlation matrix.
|
DISTANCE FROM MEAN
|
= Compute the distance from the mean for a matrix.
|
Reference:
"Graphical Exploratory Data Analysis", Du Toit, Steyn, and
Stumpf, Springer-Verlang, 1986, pp. 74-77.
"Applied Multivariate Statistical Analysis", Third Edition,
Johnson and Wichern, Prentice-Hall, 1992.
Applications:
Implementation Date:
Program:
DIMENSION 200 COLUMNS
SKIP 25
READ IRIS.DAT SEPLENG SEPWIDTH PETLENG PETWIDTH TAG
SKIP 0
LET NTOT = SIZE SEPLENG
LET X = MATRIX DEFINITION SEPLENG NTOT 4
LET D = EUCLIDEAN ROW DISTANCE MATRIX X
Date created: 6/5/2001
Last updated: 4/4/2003
Please email comments on this WWW page to
alan.heckert@nist.gov.
|