SED navigation bar go to SED home page go to Dataplot home page go to NIST home page SED Home Page SED Staff SED Projects SED Products and Publications Search SED Pages
Dataplot Vol 2 Vol 1

MATRIX DISTANCE

Name:
    MATRIX DISTANCE (LET)
Type:
    Let Subcommand
Purpose:
    Compute the distance matrix of a matrix.
Description:
    Dataplot can compute the distances relative to either rows or columns.

    Given an nxp data matrix X, we compute a distance matrix D. For row distances, the D(ij) element of the distance matrix is the distance between row i and row j, which results in a nxn D matrix. For column distances, the D(ij) element of the distance matrix is the distance between column i and column j, which results in a pxp D matrix.

    Five distance metrics are available.

    1. The Euclidean row distance is defined as

        D(ij) = SQRT(SUM(X(ik) - X(jk))**2)

      where the summation is relative to k over columns 1 to p.

      The Euclidean column distance is defined as

        D(ij) = SQRT(SUM(X(ki) - X(kj))**2)

      where the summation is relative to k over rows 1 to n.

      The Euclidean distance is simply the square root of the squared differences between corresponding elements of the rows (or columns). This is probably the most commonly used distance metric.

    2. The Mahalanobis distance is defined as

        D(ij) = SQRT[(X(i) - X(j))'SINV(X(i) - X(j))]

      where SINV is the inverse of the variance-covariance matrix of X. The row distances are obtained by letting X(i) and X(j) represent the ith and jth row while the column distances are obtained by letting X(i) and X(j) represent the ith and jth columns.

      The Mahalanobis distance is is effectively a weighted Euclidean distance where the weighting is determined by the sample variance-covariance matrix.

    3. The Minkowsky row distance is defined as

        Dij=SUM(ABS|(X(ik) - X(jk))|**P)**(1/P)

      The sum is from k = 1 to the number of columns. The column distance is similar, but the summation is over the number of rows rather than the number of columns.

      The Minkowsky distance is the pth root of the sum of the absolute differences to the pth power between corresponding elements of the rows (or columns). The Euclidean distance is the special case of P=2.

    4. The block row distance is defined as

        Dij=SUM(ABS|(X(ik) - X(jk))|)

      The sum is from k = 1 to the number of columns. The column distance is similar, but the summation is over the number of rows rather than the number of columns.

      The block distance is the sum of the absolute differences between corresponding elements of the rows (or columns). Note that this is a special case of the Minkowsky distance with p=1.

      The block distance is also known as the city block or Manhattan distance.

    5. The Chebychev row distance is defined as

        Dij=MAX(ABS|(X(ik) - X(jk))|)

      The sum is from k = 1 to the number of columns. The column distance is similar, but the summation is over the number of rows rather than the number of columns.

    Many multivariate techniques are based on distance matrices.

Syntax 1:
    LET <mat2> = <type> ROW DISTANCE <mat1>
    where <mat1> is a matrix for which the matrix distance is to be computed;
                <type> is EUCLIDEAN, MAHALANOBIS, MINKOWSKY, BLOCK, or CHEBYCHEV and defines the type of distance to compute;
    and where <mat2> is a matrix where the resulting distance matrix is saved.

    This syntax computes row distances.

Syntax 2:
    LET <mat2> = <type> COLUMN DISTANCE <mat1>
    where <mat1> is a matrix for which the matrix distance is to be computed;
                <type> is EUCLIDEAN, MAHALANOBIS, MINKOWSKY, BLOCK, or CHEBYCHEV and defines the type of distance to compute;
    and where <mat2> is a matrix where the resulting distance matrix is saved.

    This syntax computes column distances.

Examples:
    LET D = EUCLIDEAN ROW DISTANCE M
    LET D = EUCLIDEAN COLUMN DISTANCE M

    LET D = BLOCK ROW DISTANCE M
    LET D = BLOCK COLUMN DISTANCE M

    LET D = MAHALANOBIS ROW DISTANCE M
    LET D = MAHALANOBIS COLUMN DISTANCE M

    LET P = 1.5
    LET D = MINKOWSKY ROW DISTANCE M
    LET D = MINKOWSKY COLUMN DISTANCE M

Note:
    Matrices are created with either the READ MATRIX command or the MATRIX DEFINITION command. Enter HELP MATRIX DEFINITION and HELP READ MATRIX for details.
Note:
    For the Minkowsky distance, you need to specify the value of P. This is done by entering the following command before entering the MINKOWSKY DISTANCE command:

      LET P = <value>
Note:
    It is often desirable to scale the matrix before computing the distances. Dataplot provides several scaling options. Enter HELP MATRIX SCALE for details.
Note:
    The correlation matrix and covariance matrix can be considered distance matrices as well.
Default:
    None
Synonyms:
    None
Related Commands:
    READ MATRIX = Read a matrix.
    MATRIX COLUMN DIMENSION = Dimension maximum number of columns for Dataplot matrices.
    CORRELATION MATRIX = Compute the correlation matrix.
    VARIANCE-COVARIANCE MATRIX = Compute the correlation matrix.
    DISTANCE FROM MEAN = Compute the distance from the mean for a matrix.
Reference:
    "Graphical Exploratory Data Analysis", Du Toit, Steyn, and Stumpf, Springer-Verlang, 1986, pp. 74-77.

    "Applied Multivariate Statistical Analysis", Third Edition, Johnson and Wichern, Prentice-Hall, 1992.

Applications:
    Multivariate Analysis
Implementation Date:
    1998/8
Program:
    DIMENSION 200 COLUMNS
    SKIP 25
    READ IRIS.DAT SEPLENG SEPWIDTH PETLENG PETWIDTH TAG
    SKIP 0
    LET NTOT = SIZE SEPLENG
    LET X = MATRIX DEFINITION SEPLENG NTOT 4
    LET D = EUCLIDEAN ROW DISTANCE MATRIX X

Date created: 6/5/2001
Last updated: 4/4/2003
Please email comments on this WWW page to alan.heckert@nist.gov.