Dataplot Vol 2 Auxiliary Chapter

STANDARDIZE

Name:
STANDARDIZE (LET)
Type:
Let Subcommand
Purpose:
Standardize, i.e., subtract the mean and divide by the standard deviation, a variable.
Description:
In many applications, it is desirable to standardize the data values.

This command provides additional flexibility in that either one or two group id variables can also be specified. That is, if one group id variable is given, the mean and standard deviation is computed for each group and the data values are standardized by the corresponding group mean and standard deviation. Likewise, if two group variables are specified, then a mean and standard deviaiton are computed for each cell of the cross tabulation and the data values are standardized by the corresponding cell mean and standard deviaition.

You can specify several alternative measures to the mean for the location statistic and several alternative measures to the standard deviaition for the scale statistic. See the Note below for details. In addition, you can choose to standardize only by location (i.e., subtract the mean but do not divide by the standard deviation) or only by scale.

You can also specifically specify a z-score or u-score. A z-score subtracts the mean and divides by the standard deviation (i.e,, it scales to a standard normal distribution). Similarly, the u-score subtracts the minimum and divides by the range. That is, it creates a standard uniform random variable (i.e., the data is scaled to a range between 0 and 1). If a z-score or u-score is explicitly requested, the settings for the SET LOCATION STATISTIC and SET SCALE STATISTIC (see Note below) are ignored.

Syntax 1:
LET <var> = STANDARDIZE <y>             <SUBSET/EXCEPT/FOR qualification>
where <y1> is the response variable;
<var> is a variable where the standardized values are stored;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

This syntax standardizes (with respect to both location and scale) the variable with no groups.

Syntax 2:
LET <var> = LOCATION STANDARDIZE <y>             <SUBSET/EXCEPT/FOR qualification>
where <y1> is the response variable;
<var> is a variable where the standardized values are stored;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

This syntax standardizes the variable (with respect to location only) with no groups.

Syntax 3:
LET <var> = SCALE STANDARDIZE <y>             <SUBSET/EXCEPT/FOR qualification>
where <y1> is the response variable;
<var> is a variable where the standardized values are stored;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

This syntax standardizes the variable (with respect to scale only) with no groups.

Syntax 4:
LET <var> = ZSCORE <y>             <SUBSET/EXCEPT/FOR qualification>
where <y1> is the response variable;
<var> is a variable where the z-score values are stored;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

This syntax specifically computes a z-score.

Syntax 5:
LET <var> = USCORE <y>             <SUBSET/EXCEPT/FOR qualification>
where <y1> is the response variable;
<var> is a variable where the u-score values are stored;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

This syntax specifically computes a u-score.

Syntax 6:
LET <var> = STANDARDIZE <y> <x1>             <SUBSET/EXCEPT/FOR qualification>
where <y1> is the response variable;
<x1> is a group id variable;
<var> is a variable where the standardized values are stored;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

This syntax standardizes (with respect to both location and scale) the variable with one group variable.

Syntax 7:
LET <var> = LOCATION STANDARDIZE <y> <x1>             <SUBSET/EXCEPT/FOR qualification>
where <y1> is the response variable;
<x1> is a group id variable;
<var> is a variable where the standardized values are stored;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

This syntax standardizes the variable (with respect to location only) with one group variable.

Syntax 8:
LET <var> = SCALE STANDARDIZE <y> <x1>             <SUBSET/EXCEPT/FOR qualification>
where <y1> is the response variable;
<x1> is a group id variable;
<var> is a variable where the standardized values are stored;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

This syntax standardizes the variable (with respect to scale only) with one group variable.

Syntax 9:
LET <var> = ZSCORE <y> <x1>             <SUBSET/EXCEPT/FOR qualification>
where <y1> is the response variable;
<x1> is a group id variable;
<var> is a variable where the z-score values are stored;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

This syntax computes a z-score with one group variable.

Syntax 10:
LET <var> = USCORE <y> <x1>             <SUBSET/EXCEPT/FOR qualification>
where <y1> is the response variable;
<x1> is a group id variable;
<var> is a variable where the u-score values are stored;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

This syntax computes a u-score with one group variable.

Syntax 11:
LET <var> = STANDARDIZE <y> <x1> <x2>             <SUBSET/EXCEPT/FOR qualification>
where <y1> is the response variable;
<x1> is the first group id variable;
<x2> is the second group id variable;
<var> is a variable where the standardized values are stored;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

This syntax standardizes (with respect to both location and scale) the variable with two group variable.

Syntax 12:
LET <var> = LOCATION STANDARDIZE <y> <x1> <x2>
<SUBSET/EXCEPT/FOR qualification>
where <y1> is the response variable;
<x1> is the first group id variable;
<x2> is the second group id variable;
<var> is a variable where the standardized values are stored;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

This syntax standardizes the variable (with respect to location only) with two group variable.

Syntax 13:
LET <var> = SCALE STANDARDIZE <y> <x1> <x2>
<SUBSET/EXCEPT/FOR qualification>
where <y1> is the response variable;
<x1> is the first group id variable;
<x2> is the second group id variable;
<var> is a variable where the standardized values are stored;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

This syntax standardizes the variable (with respect to scale only) with two group variable.

Syntax 14:
LET <var> = ZSCORE <y> <x1> <x2>
<SUBSET/EXCEPT/FOR qualification>
where <y1> is the response variable;
<x1> is the first group id variable;
<x2> is the second group id variable;
<var> is a variable where the z-score values are stored;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

This syntax computes a z-score with two group variables.

Syntax 14:
LET <var> = USCORE <y> <x1> <x2>
<SUBSET/EXCEPT/FOR qualification>
where <y1> is the response variable;
<x1> is the first group id variable;
<x2> is the second group id variable;
<var> is a variable where the u-score values are stored;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

This syntax computes a u-score with two group variables.

Examples:
LET Y2 = STANDARDIZE Y1
LET Y2 = LOCATION STANDARDIZE Y1
LET Y2 = LOCATION STANDARDIZE Y1 X1
LET Y2 = LOCATION STANDARDIZE Y1 X1 X2

SET LOCATION STATISTIC MEDIAN
LET Y2 = STANDARDIZE Y1 X1 X2

Note:
The most common application of this command is to standardize using the mean as the location measure and the standard deviation as the scale measure. Several alternative measures are allowed.

To set the location measure, enter the command

SET LOCATION STATISTIC <MEAN/MEDIAN/MIDMEAN/TRIMMED MEAN/ WINSORIZED MEAN/MIDRANGE/HARMONIC MEAN/GEOMETRIC MEAN>

To set the scale measure, enter the command

Here, SD is the standard deviation, MAD is the median absolute deviation, and AAD is the average absolute deviaiton.

Note the using the ZSCORE or USCORE syntax overrides the settings specified by these SET commands. That is, ZSCORE always uses the mean and standard deviation and USCORE always uses the minimum and the range.

Default:
The default location statistic is the mean and the default scale statistic is the standard deviation.
Synonyms:
IQ RANGE is a synonym for INTERQUARTILE RANGE.
Related Commands:
 MEAN PLOT = Generate a mean vs. subset plot. SD PLOT = Generate a standard deviation vs. subset plot. TABULATE = Compute group statistics (one group variable). CROSS TABULATE = Compute group statistics (two group variables). MEDIAN = Compute the median. MIDDMEAN = Compute the midmean. TRIMMED MEAN = Compute the trimmed mean. SD = Compute the standard deviation. AAD = Compute the average absolute deviation. MAD = Compute the median absolute deviation.
Applications:
Data Analysis
Implementation Date:
2001/3: Initial Implementation

1. Additional location statistics added: MINIMUM, HARMONIC MEAN, GEOMETRIC MEAN, WINSORIZED MEAN, MIDRANGE
Program 1:
SKIP 25
LET Y2 = STANDARDIZE Y X
Program 2:
SKIP 25
LET Y2 = LOCATION STANDARDIZE Y X
Program 3:
SKIP 25