Dataplot Vol 1 Vol 2

# TWO WAY PLOT

Name:
TWO WAY PLOT
Type:
Graphics Command
Purpose:
Given a response variable and associated variables containing laboratory id's and material id's, generate a plot of each laboratory against the column average. In addition, perform a row linear (or column linear) analysis of variance.
Description:
This plot was developed in the context of an interlaboratory analysis as defined by the ASTM E691 standard

"Standard Practice for Conducting an Interlaboratory Study to Determine the Precision of a Test Method", ASTM International, 100 Barr Harbor Drive, PO BOX C700, West Conshohoceken, PA 19428-2959, USA.

This standard addresses the situation where there are two factors (material and laboratory) and there is a full factorial balanced design (i.e., each combination of material and laboratory is run with an equal number of replications). The E691 INTERLAB command generates the tables described in the standard.

John Mandel proposed that a "phase 3" (see Analyzing Interlaboratory Data According to ASTM Standard E691) to examine the underlying mathematical model may sometimes be useful.

The standard two-way additive ANOVA model is

$$Y_{ij} = M + (R_{i} - M) + (C_{j} - M) + d^{*}_{ij}$$

where

 $$M$$ = overall mean $$R_{i}$$ = average of all elements of the i-th laboratory $$C_{j}$$ = average of all elements of the j-th material $$d^{*}_{ij}$$ = the error term including both random error and "interaction" effects

The main row effect is $$R_{i} - M$$ and the main column effect is $$C_{j} - M$$.

For the case where there is significant (in the sense of being much larger than the random error) interaction, Mandel introduced the "row-linear" model

$$Y_{ij} = R_{i} + B_{i}(C_{j} - M) + d_{ij}$$

That is, you essentially generate a linear fit for a specific laboratory across the various materials. This model effectively partitions the $$d^{*}_{ij}$$ into a "systematic" and a "random" component

$$d^{*}_{ij} = (B_{i} - 1)(C_{j} - M) + d_{ij}$$

where $$d_{ij}$$ is the random component and the rest is the systematic component. If the row-linear model is appropriate, then the systematic component should be much larger than the random component. Essentially, the systematic component is fitting a linear function of each laboratory against the average of all laboratories. The $$B_{i}$$ in the above equation are the slopes of the linear fits.

This command generates a plot of the linear fits for each row. Specifically, for each lab i plot

$$Y_{ij}$$ versus $$C_{j} \hspace{0.5in}$$ for j = 1 to number of materials

That is, you plot a given laboratory's value against the average of all laboratories for each material.

Alternatively you can plot (see Note section below)

$$Y_{ij} - C_{j}$$ versus $$C_{j} \hspace{0.5in}$$ for j = 1 to number of materials

That is, plot the deviations from the column average versus the column average.

In either case, the fitted lines are overlaid on the data points. The fundamental linearity is the same in either version of the plot.

If the row-linear model is appropriate, the points for each laboratory should be approximately linear. If the slopes are all approximately equal to one, this implies that the row-linear model reduces to the additive model.

In addition, this command generates the following tables:

1. The first table contains the following columns:

 Column 1: Lab-ID Column 2: the height of the fit (the height is the predicted value at $$\bar{x}$$ where $$\bar{x}$$ is the mean of the material values for a given laboratory) Column 3: the slope of the fit Column 4: the residual standard deviation of the fit Column 5: the standard error of the slope Column 6: the correlation coefficient

The number of rows in the table is equal to the number of laboratories.

2. The second table contains the average over all laboratories for each value of the material.

3. The third table contains an analysis of variance table for the row-linear model. Specifically, it partitions the error sum of squares into a residuals component and a slopes (i.e., the sum of squares accounted for by the row-linear structure).

Similarly, you can create a "column linear" model

$$Y_{ij} = C_{j} + B_{j} (R_{i} - M) + d_{ij}$$

In the context of the E691 standard, the row linear model, where the rows denote laboratories, is typically of more interest. The remaining discussion will be in terms of the row linear model, but it applies equally to the column linear model (just interchange the roles of the rows and columns).

Mandel goes on further to discuss "concurrent" models. In some cases, the slopes will exhibit a systematic pattern. If we generate a plot of the slopes versus the heights and this plot indicates a linear pattern, this is evidence of a concurrent model. When the concurrent model is appropriate, the fitted lines generated by the TWO WAY ROW PLOT command tend towards a common point. Call the y-coordinate of the common point y0. Then the concurrent model is

$$Y_{ij} = y_{0} + (R_{i} - y_{0}) (C_{j} - y_{0})/(M - y_{0}) + d_{ij}$$

If y0 is zero, then the concurrent model reduces to the standard multiplicative model

$$Y_{ij} = R_{i} C_{j}/M + d_{ij}$$

Concurrent models can be useful when you have both row and column linearity. Note that the ANOVA table generated by this command further partitions the slopes sum of squares into "Concurrence" and "Non-Concurrence" parts.

Although motivated by the E691 analysis, this plot can be used for any two factor data set from a full factorial design (i.e., all combinations of levels from the two factors are included). If there is replication within a cell, the mean of the replicates will be used. If there are any missing cells, an error will be reported and no plots or tables will be generated.

The above is only a brief outline of row-linear models. For more detailed discussion and derivations, consult Mandel's publications in the Reference section below.

Syntax 1:
TWO WAY ROW PLOT <y> <labid> <matid>
<SUBSET/EXCEPT/FOR qualification>
where <y> is a response variable;
<labid> is a variable that specifies the lab-id;
<matid> is a variable that specifies the material-id;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

This syntax plots and fits the row-linear model.

Syntax 2:
TWO WAY COLUMN PLOT <y> <labid> <matid>
<SUBSET/EXCEPT/FOR qualification>
where <y> is a response variable;
<labid> is a variable that specifies the lab-id;
<matid> is a variable that specifies the material-id;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

This syntax plots and fits the column-linear model.

Examples:
TWO WAY ROW PLOT Y LABID MATID
TWO WAY COLUMN PLOT Y LABID MATID
TWO WAY ROW PLOT Y LABID MATID > 2
Note:
To plot the deviations from the column average versus the column average (rather than the raw data), enter the command

SET TWO WAY PLOT Y AXIS DEVIATION

To reset the default, enter

SET TWO WAY PLOT Y AXIS RAW
Note:
If the fitted lines are basically parallel and the slopes are all approximately equal to one, then the row-linear model reduces to the standard additive model. That is, the error will be predominately the "random" component and the "systematic" error is minimal.

Next, compare the "standard deviation of the slopes" to the standard deviations of the individual fits (i.e., the RESSD column in Table 1). If the standard deviation of the slopes is significantly larger than the RESSD values, this is evidence that there is a systematic effect for the laboratories and that the additive model is not applicable. The square of the correlation coefficients of the fits gives an indication of how much of the variablity is accounted for by the linear fit (i.e., the systematic component) and how much is random.

If there is evidence of a systematic effect, then plot the fit slopes versus the fit heights. If this plots shows approximate linearity, then this is evidence for concurrence in the data.

This is only a brief overview of the interpretation of row linear models. Consult the Mandel publications listed in the References section for a more complete discussion.

Note:
The TWO WAY PLOT command is typically preceded by an E691 INTERLAB command. The documenation for the E691 INTERLAB command discusses a number of plots that are typically generated for an E691 analysis. Specifically, plots of the h- and k-consistency statistics are highly recommended.

The following additional plots are recommended for the analysis of row linear models.

• Plot the residuals from the row linear fit. Mandel recommends plotting the standardized residuals rather than the raw residuals. The standardized residuals are the raw residuals divided by the residual standard deviation of the fit.

Enter HELP TWO FACTOR PLOT for an example of plotting these residuals.

• Plot the slopes of the fits versus the heights of the fits. An approximately linear relationship for this plot is evidence of "concurrence".
Note:
The columns printed in table 1 are also written to the file dpst1f.dat. The columns printed in table 2 are also written to dpst2f.dat.

The following are written to dpst3f.dat:

 Column 1: row-id Column 2: column-id Column 3: $$Y_{ij}$$ (this is after replications have been averaged) Column 4: predicted values for the row linear fits Column 5: raw residuals for the row linear fits Column 6: standardized residuals for the row linear fits

This information is written to files to make it easier to generate some of the complimentary plots useful in analyzing row linear models.

Note:
By default, the values for the factor variable (column 1 in tables 1 and 2) are coded. That is, the minimum value is set to 1, the next smallest values is set to 2, and so on. If you want the actual value to be printed in these tables, enter the command

SET TWO WAY PLOT FACTOR LABEL VALUE

To reset the default, enter the command

SET TWO WAY PLOT FACTOR LABEL CODED
Note:
The SET WRITE DECIMALS command can be used to control how many digits to the right of the decimal point are used in the tables generated by this command. Specifically, a positive integer will print the number in decimal format. For example, SET WRITE DECIMAL 2 will generate numbers of the form 32.46. If you enter a negative integer, then the numbers will be written in exponential format where the postive value of the given number of decimals will specify the number of significant digits in the exponential number. For example, SET WRITE DECIMALS -7 will generate numbers of the form 0.2093738E+07. A SET WRITE DECIMAL 0 command specifies that the numbers will be written as integers.

The factor variable (i.e., column 1 in tables 1 and 2) is often an integer value. The following command allows you to specify a different number of digits for this column

SET TWO WAY PLOT FACTOR DECIMAL <value>

This command follows the same rules as the SET WRITE DECIMAL command and is typically set to 0. If you want this column to use the value given by the SET WRITE DECIMALS command (the default), enter

SET TWO WAY PLOT FACTOR DECIMAL -99
Note:
If you want to suppress the fit table (table 1), enter the command

SET TWO WAY PLOT FIT TABLE OFF

To restore the default of printing the fit table, enter

SET TWO WAY PLOT FIT TABLE ON
Note:
If you want to suppress the column averages table (table 2), enter the command

SET TWO WAY PLOT AVERAGES TABLE OFF

To restore the default of printing the column averages table, enter

SET TWO WAY PLOT AVERAGES TABLE ON
Note:
If you want to suppress the ANOVA table (table 3), enter the command

SET TWO WAY PLOT ANOVA TABLE OFF

To restore the default of printing the ANOVA table, enter

SET TWO WAY PLOT ANOVA TABLE ON

The sum of squares and mean sum of squares columns in the ANOVA table can be quite large. For this reason, you may want these to be written in exponential format. To specify the number of decimals for these columns, enter the command

SET TWO WAY PLOT ANOVA TABLE DECIMAL <value>

This command follows the same rules as the SET WRITE DECIMAL command and is often set to -7. If you want this column to use the value given by the SET WRITE DECIMALS command (the default), enter

SET TWO WAY PLOT ANOVA TABLE DECIMAL -99
Note:
In chapter of Mandel's "Evaluation and Control of Measurements" book, Mandel discusses the relationship between the biplot and the row linear model. The biplot is a graphical method introduced by Gabriel that can be used as a diagnostic to determine whether row linear or column linear relationships exist.
Related Commands:
 E691 INTERLAB = Perform an interlaboratory analysis based on the E691 standard. H CONSISTENCY PLOT = Generate h-consistency statistic plots. K CONSISTENCY PLOT = Generate k-consistency statistic plots. TWO FACTOR PLOT = Generate a run sequence plot with two factor variables.
References:
"Standard Practice for Conducting an Interlaboratory Study to Determine the Precision of a Test Method", ASTM International, 100 Barr Harbor Drive, PO BOX C700, West Conshohoceken, PA 19428-2959, USA.

Mandel (1961), "Non-Additivity in Two-Way Analysis of Variance", Journal of the American Statistical Association, Vol. 56, pp. 878-888.

Mandel (1995), "Structure and Outliers in Interlaboratory Studies", Journal of Testing and Evaluation, Vol. 23, No. 5, pp. 364-369.

Mandel (1994), "Models and Interactions", Journal of Test and Evaluation, Vol. 19, No. 5, pp. 398-402.

Mandel (1994), "Analyzing Interlaboratory Data According to ASTM Standard E691", Quality and Statistics: Total Quality Management, ASTM STP 1209, Kowalewski, Ed., American Society for Testing and Materials, Philadelphia, PA 1994, pp. 59-70.

Mandel (1994), "Analysis of Two-Way Layouts", Chapman & Hall, New York.

Mandel (1993), "Outliers in Interlaboratory Testing", Journal of Testing and Evaluation, Vol. 21, No. 2, pp. 132-135.

Mandel (1991), "Evaluation and Control of Measurements", Marcel Dekker, Inc.

Bradu and Gabriel (1978), "The Biplot as a Diagnostic Tool for Models of Two-Way Tables", Technometrics, Vol. 20, No. 1, pp. 47-68.

Applications:
Interlaboratory Studies
Implementation Date:
2015/6
Program 1:

. Step 1:   Read the data
.
dimension 40 columns
skip 25
.
variable label y Compressive Strength
variable label x1 Lab-ID
variable label x2 Temperature
.
. Step 2:   Define some default plot control settings
.
case asis
title case asis
title offset 2
label case asis
tic mark offset units screen
tic mark offset 3 3
.
. Step 3:   Generate the plot
.
x1label Column Average
character blank all
line dash all
loop for k = 1 1 10
let kindex = (k-1)*2 + 1
let plot character kindex = ^k
let plot line      kindex = blank
end of loop
.
set two way plot factor label value
set two way plot factor decimal 0
set two way plot anova table decimals -7
set write decimals 4
title Compressive Strength of Rubber
y1label Data by Rows
.
two way row plot y x1 x2
.
. Step 4:   Generate the slope versus height plot
.
skip 1
.
fit slope height
let htmin = minimum height
let htmax = maximum height
let function f = a0 + a1*x
.
character circle
character hw 1 0.75
character fill on
line blank dash
y1label Slope
x1label Height
title
.
xlimits 4500 6000
.
plot slope height and
plot f for x = htmin 0.1 htmax

The following output is generated
 Parameters of Row-Linear Fit for Compressive Strength
-------------------------------------------------------------------------------------
Standard Error    Correlation
Lab-ID         Height          Slope          RESSD       of Slope    Coefficient
-------------------------------------------------------------------------------------
1      4900.0000         0.8305       150.6740         0.0424         0.9961
2      5814.0000         0.9721       106.3275         0.0299         0.9986
3      4967.0000         0.8240        53.0537         0.0149         0.9995
4      5485.0000         1.0012       182.6123         0.0514         0.9961
5      5433.0000         0.9885       188.1032         0.0530         0.9957
6      5454.0000         0.9960        84.8406         0.0239         0.9991
7      5312.0000         1.0239        92.6250         0.0261         0.9990
8      5360.0000         1.0486        50.7361         0.0143         0.9997
9      5967.0000         1.2107       210.1819         0.0592         0.9964
10      5429.0000         1.1045       173.7049         0.0489         0.9971

Standard Deviation of Slopes:                 0.1149
Pooled Standard Deviation of Fit:           148.4192

Column Averages
---------------------------
Column
Temperature        Average
---------------------------
-20      7567.5000
0      6772.5000
20      5252.5000
40      4209.5000
60      3258.5000

Mean of Column Means:            5412.1000

ANOVA Table for Row-Linear Fit
-----------------------------------------------------------------
Degrees of         Sum of           Mean
Source                      Freedom        Squares         Square
-----------------------------------------------------------------
Total                            49  0.1329069E+09  0.2712385E+07
Rows                              9  0.4751624E+07  0.5279583E+06
Column                            4  0.1260615E+09  0.3151537E+08
Error                            36  0.2093738E+07  0.5815939E+05
Residuals                      27  0.5947629E+06  0.2202825E+05
Slopes                          9  0.1498975E+07  0.1665528E+06
Concurrence                   1  0.9485005E+06  0.9485005E+06
Non-Concurrence               8  0.5504746E+06  0.6880933E+05



NIST is an agency of the U.S. Commerce Department.

Date created: 07/08/2015
Last updated: 07/08/2015