Dataplot Vol 1 Vol 2

# PREDICTION BOUNDS

Name:
PREDICTION BOUNDS
Type:
Analysis Command
Purpose:
Generates prediction bounds for all m new observations given a previous sample.
Description:
Given a sample of n observations with mean $$\bar{x}$$ and standard deviation s, the two-sided prediction interval to contain all of m new indpendent, identically distributed observations is

$$\bar{x} \pm r_{(1 - \alpha,m,n)} s$$

A conservative approximation for r(1-$$\alpha$$,m,n) is

$$\sqrt{1 + \frac{1}{n}} t_{(1 - \alpha/(2m);n-1)}$$

with t denoting the t percent point function. Dataplot uses the tabulated values given in Table A.13 of Hahn and Meeker when n and m are both less than or equal to 10. Otherwise, the approximation above is used.

The corresponding one-sided interval is

$$\mbox{lower limit} = \bar{x} - r'_{(1 - \alpha;m,n)} s$$

$$\mbox{upper limit} = \bar{x} + r'_{(1 - \alpha;m,n)} s$$

A conservative approximation for r'(1-$$\alpha$$,m,n) is

$$\sqrt{1 + \frac{1}{n}} t_{(1 - \alpha/m;n-1)}$$

with t denoting the t percent point function. Dataplot uses the tabulated values given in Table A.14 of Hahn and Meeker when n and m are both less than or equal to 10. Otherwise, the approximation above is used.

In the formula above, the only value from the new observations is the sample size. That is, it can be applied before the new data is actually collected. The number of observations for the new sample is entered with the command

LET NNEW = <value>

If NNEW is not defined, then a value of 1 is used.

The difference between the PREDICTION BOUNDS command and the PREDICTION LIMITS command is that the PREDICTION LIMITS command generates a prediction interval for the mean of m new observations while the PREDICTION BOUNDS command generates a prediction interval to contain all of the new observations.

This prediction interval is based on the assumption that the underlying data is approximately normally distributed. Due to the central limit thereom, prediction limits for the mean are fairly robust against non-normality. However, the central limit thereom does not apply to prediction intervals to cover all of the new observations. So the PREDICTION BOUNDS command is much more sensitive to non-normality than is the PREDICTION LIMITS command.

Syntax 1:
<LOWER/UPPER> <LOGNORMAL/BOXCOX> PREDICTION BOUNDS <y>
<SUBSET/EXCEPT/FOR qualification>
where <y> is the response variable;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

If LOWER is specified, a one-sided lower prediction limit is returned. If UPPER is specified, a one-sided upper prediction limit is returned. If neither is specified, a two-sided limit is returned.

If the keyword LOGNORMAL is present, the log of the data will be taken, then the normal prediction bounds will be computed, and then the computed normal lower and upper limits will be exponentiated to obtain the lognormal prediction bounds.

Similarly, if the keyword BOXCOX is present, a Box-Cox transformation to normality will be applied to the data before computing the normal prediction bounds. The computed lower and upper limits will then be transformed back to the original scale.

This syntax supports matrix arguments for the response variable.

Syntax 2:
MULTIPLE <LOWER/UPPER> <LOGNORMAL/BOXCOX>
PREDICTION BOUNDS <y1> ... <yk>
<SUBSET/EXCEPT/FOR qualification>
where <y1> .... <yk> is a list of 1 to 30 response variables;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

This syntax will generate a prediction interval for each of the response variables.

If LOWER is specified, a one-sided lower prediction limit is returned. If UPPER is specified, a one-sided upper prediction limit is returned. If neither is specified, a two-sided limit is returned.

If the keyword LOGNORMAL is present, the log of the data will be taken, then the normal prediction bounds will be computed, and then the computed normal lower and upper limits will be exponentiated to obtain the lognormal prediction bounds.

Similarly, if the keyword BOXCOX is present, a Box-Cox transformation to normality will be applied to the data before computing the normal prediction bounds. The computed lower and upper limits will then be transformed back to the original scale.

This syntax supports matrix arguments for the response variables.

Syntax 3:
REPLICATED <LOWER/UPPER> <LOGNORMAL/BOXCOX>
PREDICTION BOUNDS <y> <x1> ... <xk>
<SUBSET/EXCEPT/FOR qualification>
where <y> is the response variable;
<x1> .... <xk> is a list of 1 to 6 group-id variables;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

This syntax performs a cross-tabulation of the <x1> ... <xk> and generates a prediction interval for each unique combination of the cross-tabulated values. For example, if X1 has 3 levels and X2 has 2 levels, six prediction intervals will be generated.

If LOWER is specified, a one-sided lower prediction limit is returned. If UPPER is specified, a one-sided upper prediction limit is returned. If neither is specified, a two-sided limit is returned.

If the keyword LOGNORMAL is present, the log of the data will be taken, then the normal prediction bounds will be computed, and then the computed normal lower and upper limits will be exponentiated to obtain the lognormal prediction bounds.

Similarly, if the keyword BOXCOX is present, a Box-Cox transformation to normality will be applied to the data before computing the normal prediction bounds. The computed lower and upper limits will then be transformed back to the original scale.

This syntax does not support matrix arguments.

Examples:
PREDICTION BOUNDS Y1
PREDICTION BOUNDS Y1 SUBSET TAG > 2
MULTIPLE PREDICTION BOUNDS Y1 TO Y5
REPLICATED PREDICTION BOUNDS Y X
Note:
A table of prediction limits is printed for alpha levels of 50.0, 80.0, 90.0, 95.0, 99.0, and 99.9.
Note:
In addition to the PREDICTION BOUNDS command, the following commands can also be used:

LET ALPHA = 0.05
LET NNEW = <value>

LET A = LOWER PREDICTION BOUNDS Y
LET A = UPPPER PREDICTION BOUNDS Y
LET A = ONE SIDED LOWER PREDICTION BOUNDS Y
LET A = ONE SIDED UPPER PREDICTION BOUNDS Y

LET A = SUMMARY LOWER PREDICTION BOUNDS YMEAN YSD N
LET A = SUMMARY UPPPER PREDICTION BOUNDS YMEAN YSD N
LET A = SUMMARY ONE SIDED LOWER PREDICTION BOUNDS YMEAN YSD N
LET A = SUMMARY ONE SIDED UPPER PREDICTION BOUNDS YMEAN YSD N

The first two commands specify the significance level and the number of new observations. The next four commands are used when you have raw data. The last four commands are used when only summary data (mean, standard deviation, sample size) is available.

In addition to the above LET command, built-in statistics are supported for about 20 different commands (enter HELP STATISTICS for details).

Default:
None
Synonyms:
None
Related Commands:
 PREDICTION LIMITS = Generate prediction intervals for the mean of new observations. SD PREDICTION BOUNDS = Generate prediction limits for the standard deviation. CONFIDENCE LIMITS = Generate a confidence limit. TOLERANCE LIMITS = Generate a tolerance limit.
Reference:
Hahn and Meeker (1991), "Statistical Intervals: A Guide for Practitioners," Wiley, pp. 62-63.
Applications:
Confirmatory Data Analysis
Implementation Date:
2013/04
Program 1:

SKIP 25
SET WRITE DECIMALS 5
LET NNEW = 5
.
PREDICTION BOUNDS Y
LOWER PREDICTION BOUNDS Y
UPPER PREDICTION BOUNDS Y

The following output is generated
            Two-Sided Prediction Bounds for All Observations

Response Variable: Y

Summary Statistics:
Number of Observations:                             195
Sample Mean:                                    9.26146
Sample Standard Deviation:                      0.02278
Number of New Observations:                           5

Two-Sided Prediction Bounds for All Observations
------------------------------------------
Confidence          Lower          Upper
Value (%)          Limit          Limit
------------------------------------------
50.0        9.22370        9.29922
80.0        9.21422        9.30870
90.0        9.20786        9.31505
95.0        9.20202        9.32089
99.0        9.18988        9.33303
99.9        9.17483        9.34808

One-Sided Lower Prediction Bounds for All Observations

Response Variable: Y

Summary Statistics:
Number of Observations:                             195
Sample Mean:                                    9.26146
Sample Standard Deviation:                      0.02278
Number of New Observations:                           5

One-Sided Lower Prediction Bounds for All Observations
---------------------------
Confidence          Lower
Value (%)          Limit
---------------------------
50.0        9.23208
80.0        9.22125
90.0        9.21422
95.0        9.20786
99.0        9.19490
99.9        9.17914

One-Sided Upper Prediction Bounds for All Observations

Response Variable: Y

Summary Statistics:
Number of Observations:                             195
Sample Mean:                                    9.26146
Sample Standard Deviation:                      0.02278
Number of New Observations:                           5

One-Sided Upper Prediction Bounds for All Observations
---------------------------
Confidence          Upper
Value (%)          Limit
---------------------------
50.0        9.29084
80.0        9.30166
90.0        9.30870
95.0        9.31505
99.0        9.32801
99.9        9.34377

Program 2:

SKIP 25
SET WRITE DECIMALS 5
LET NNEW = 5
.
REPLICATED PREDICTION BOUNDS Y X


The following output is generated
            Two-Sided Prediction Bounds for All Observations

Response Variable: Y
Factor Variable 1: X                            1.00000

Summary Statistics:
Number of Observations:                              10
Sample Mean:                                    0.99800
Sample Standard Deviation:                      0.00434
Number of New Observations:                           5

Two-Sided Prediction Bounds for All Observations
------------------------------------------
Confidence          Lower          Upper
Value (%)          Limit          Limit
------------------------------------------
90.0        0.98560        1.01039
95.0        0.98356        1.01243
99.0        0.97869        1.01730

Two-Sided Prediction Bounds for All Observations

Response Variable: Y
Factor Variable 1: X                            2.00000

Summary Statistics:
Number of Observations:                              10
Sample Mean:                                    0.99910
Sample Standard Deviation:                      0.00521
Number of New Observations:                           5

Two-Sided Prediction Bounds for All Observations
------------------------------------------
Confidence          Lower          Upper
Value (%)          Limit          Limit
------------------------------------------
90.0        0.98422        1.01397
95.0        0.98177        1.01642
99.0        0.97592        1.02227

Two-Sided Prediction Bounds for All Observations

Response Variable: Y
Factor Variable 1: X                            3.00000

Summary Statistics:
Number of Observations:                              10
Sample Mean:                                    0.99540
Sample Standard Deviation:                      0.00397
Number of New Observations:                           5

Two-Sided Prediction Bounds for All Observations
------------------------------------------
Confidence          Lower          Upper
Value (%)          Limit          Limit
------------------------------------------
90.0        0.98405        1.00674
95.0        0.98219        1.00860
99.0        0.97773        1.01306

Two-Sided Prediction Bounds for All Observations

Response Variable: Y
Factor Variable 1: X                            4.00000

Summary Statistics:
Number of Observations:                              10
Sample Mean:                                    0.99820
Sample Standard Deviation:                      0.00385
Number of New Observations:                           5

Two-Sided Prediction Bounds for All Observations
------------------------------------------
Confidence          Lower          Upper
Value (%)          Limit          Limit
------------------------------------------
90.0        0.98721        1.00918
95.0        0.98540        1.01099
99.0        0.98108        1.01531

Two-Sided Prediction Bounds for All Observations

Response Variable: Y
Factor Variable 1: X                            5.00000

Summary Statistics:
Number of Observations:                              10
Sample Mean:                                    0.99190
Sample Standard Deviation:                      0.00757
Number of New Observations:                           5

Two-Sided Prediction Bounds for All Observations
------------------------------------------
Confidence          Lower          Upper
Value (%)          Limit          Limit
------------------------------------------
90.0        0.97029        1.01350
95.0        0.96673        1.01706
99.0        0.95823        1.02556

Two-Sided Prediction Bounds for All Observations

Response Variable: Y
Factor Variable 1: X                            6.00000

Summary Statistics:
Number of Observations:                              10
Sample Mean:                                    0.99879
Sample Standard Deviation:                      0.00988
Number of New Observations:                           5

Two-Sided Prediction Bounds for All Observations
------------------------------------------
Confidence          Lower          Upper
Value (%)          Limit          Limit
------------------------------------------
90.0        0.97061        1.02698
95.0        0.96596        1.03163
99.0        0.95488        1.04271

Two-Sided Prediction Bounds for All Observations

Response Variable: Y
Factor Variable 1: X                            7.00000

Summary Statistics:
Number of Observations:                              10
Sample Mean:                                    1.00150
Sample Standard Deviation:                      0.00787
Number of New Observations:                           5

Two-Sided Prediction Bounds for All Observations
------------------------------------------
Confidence          Lower          Upper
Value (%)          Limit          Limit
------------------------------------------
90.0        0.97904        1.02395
95.0        0.97533        1.02766
99.0        0.96650        1.03649

Two-Sided Prediction Bounds for All Observations

Response Variable: Y
Factor Variable 1: X                            8.00000

Summary Statistics:
Number of Observations:                              10
Sample Mean:                                    1.00039
Sample Standard Deviation:                      0.00362
Number of New Observations:                           5

Two-Sided Prediction Bounds for All Observations
------------------------------------------
Confidence          Lower          Upper
Value (%)          Limit          Limit
------------------------------------------
90.0        0.99005        1.01074
95.0        0.98835        1.01244
99.0        0.98428        1.01651

Two-Sided Prediction Bounds for All Observations

Response Variable: Y
Factor Variable 1: X                            9.00000

Summary Statistics:
Number of Observations:                              10
Sample Mean:                                    0.99829
Sample Standard Deviation:                      0.00413
Number of New Observations:                           5

Two-Sided Prediction Bounds for All Observations
------------------------------------------
Confidence          Lower          Upper
Value (%)          Limit          Limit
------------------------------------------
90.0        0.98650        1.01009
95.0        0.98455        1.01204
99.0        0.97991        1.01668

Two-Sided Prediction Bounds for All Observations

Response Variable: Y
Factor Variable 1: X                           10.00000

Summary Statistics:
Number of Observations:                              10
Sample Mean:                                    0.99479
Sample Standard Deviation:                      0.00532
Number of New Observations:                           5

Two-Sided Prediction Bounds for All Observations
------------------------------------------
Confidence          Lower          Upper
Value (%)          Limit          Limit
------------------------------------------
90.0        0.97960        1.00999
95.0        0.97710        1.01249
99.0        0.97112        1.01847

Program 3:

.  Following example from Hahn and Meeker's book.
.
let ymean = 50.10
let ysd   = 1.31
let n1    = 5
let nnew  = 3
let alpha = 0.05
.
set write decimals 5
let slow1 = summary lower prediction bounds ymean ysd n1
let supp1 = summary upper prediction bounds ymean ysd n1
let slow2 = summary one sided lower prediction bounds ymean ysd n1
let supp2 = summary one sided upper prediction bounds ymean ysd n1
print slow1 supp1 slow2 supp2

The following output is generated
 PARAMETERS AND CONSTANTS--

SLOW1   --       44.74603
SUPP1   --       55.45397
SLOW2   --       45.75080
SUPP2   --       54.44920

Date created: 04/15/2013
Last updated: 12/11/2023