
BEST CPName:
There can be some variations in the above approaches. For example,
The choice of these critierion is complicated by the fact that adding additional variables will always increase the R^{2} of the fit (or at least not decrease it). However, including too many variables increases multicolinearity which results in numerically unstable models (i.e., you are essentially fitting noise). In addition, the model becomes more complex than it needs to be. A number of critierion have been proposed that attempt to balance maximizing the fit while trying to protect against overfitting. All subsets regression is the preferred algorithm in that it examines all models. However, it can be computationally impractical to perform all subsets regression when the number of independent variables becomes large. The primary disadvantage of forward/backward stepwise regression is that it may miss good candidate models. Also, they pick a single model rather than a list of good candidate models that can be examined closer. Dataplot addresses this issue with the BEST CP command. This is based on the following:
It should be emphasized that the BEST CP command is intended simply to identify good candidate models. Also, the BEST CP command uses a computationally fast algorithm that is not as accurate as the algorithm used by the FIT command. The FIT command should be applied to identified models that are of interest. Also, standard regression diagnostics should be examined to the candidate models of interest.
where <y> is the response (dependent) variable; <x1> .... <xk> is a list of one or more independent variables; and where the <SUBSET/EXCEPT/FOR qualification> is optional.
BEST CP Y X1 X2 X3 X4 X5 X6 X7 SUBSET TAG > 1
To change the number of candidate models chosen, enter the command
where <value> identifies the number of candidate models. Note that increasing <value> will result in greater time to generate the best candidate models. In most cases, the default value of 10 is adequate.
Dataplot writes the results of the CP analysis to file. The example program below shows how to generate a CP plot using these files. Specifically,
4 2 1 3 12 14 34 23 24 124 123 134 234 1234
Schwarz introduced an alternative information critierion called the Bayesian Information Critierion (BIC). The BIC penalizes the likelihood more than the AIC for additional parameters. For large n, the BIC can be approximated by
\(\hat{L}\) is the maximized value of the likelihood function. In the context of regression, the BIC can be computed as
where
The 2013/10 version of Dataplot added the BIC value for the selected models to the output. Note that the models are selected on the basis of Mallow's CP, not BIC. BIC is provided as an additional comparison.
C. L. Mallows (1966), "Choosing a Subset Regression," Joint Statistical Meetings, Los Angeles, CA. Sally Peavy, Shirley Bremer, Ruth Varner, and David Hogben (1986), "OMNITAB 80: An Interpretive System for Statistical and Numerical Data Analysis," NIST Special Publication 701. Thomas Ryan (1997), "Modern Regresion Methods," John Wiley, pp. 223228. Schwarz (1978), "Estimating the dimension of a model," Annals of Statistics, Vol. 6, No. 2, pp. 461–464. Boisbunon, Canu, Fourdrinier, Strawderman, and Wells (2013), "AIC and Cp as estimators of loss for spherically symmetric distributions," arXiv:1308.2766.
2013/10: Reformatted output 2013/10: Added BIC values to output skip 25 read hald647.dat y x1 x2 x3 x4 . echo on capture junk.dat best cp y x1 x2 x3 x4 end of capture . skip 0 read dpst1f.dat p cp read row labels dpst2f.dat title case asis label case asis character rowlabels line blank tic offset units data xtic offset 0.3 0.3 ytic offset 10 0 let maxp = maximum p major xtic mark number maxp xlimits 1 maxp title Best CP Plot (HALD647.DAT Example) x1label P y1label C(p) plot cp p line solid draw data 1 1 maxp maxpThe following output is generated for the BEST CP command. Regression with One Variable  C(p) Statistic BIC Variables  138.73082 59.98154 4 142.48641 60.30789 2 202.54876 64.64937 1 315.15428 70.19729 3 Regressions with 2 Variables C(p) = 2.678, BIC = 27.115  Variable Coefficient F Ratio  X1 1.46831 146.522 X2 0.66225 208.581 C(p) = 5.496, BIC = 30.437  Variable Coefficient F Ratio  X1 1.43995 108.224 X4 0.61395 159.294 C(p) = 22.373, BIC = 41.547  Variable Coefficient F Ratio  X3 1.19985 40.295 X4 0.72460 100.356 C(p) = 62.438, BIC = 52.732  Variable Coefficient F Ratio  X2 0.73133 36.682 X3 1.00838 11.816 C(p) = 138.226, BIC = 62.324  Variable Coefficient F Ratio  X2 0.31090 0.172 X4 0.45694 0.431  C(p) Statistic BIC Variables  198.09465 66.81153 1 3 Regressions with 3 Variables C(p) = 3.018, BIC = 27.234  Variable Coefficient F Ratio  X1 1.45194 154.008 X2 0.41611 5.025 X4 0.23654 1.863 C(p) = 3.041, BIC = 27.271  Variable Coefficient F Ratio  X1 1.69588 68.715 X2 0.65691 220.546 X3 0.25002 1.832 C(p) = 3.497, BIC = 27.987  Variable Coefficient F Ratio  X1 1.05184 22.112 X3 0.41004 4.235 X4 0.64280 208.240 C(p) = 7.337, BIC = 32.836  Variable Coefficient F Ratio  X2 0.92342 12.426 X3 1.44797 96.939 X4 1.55704 41.654 Regressions with 4 Variables C(p) = 5.000, BIC = 29.769  Variable Coefficient F Ratio  X1 1.55109 4.336 X2 0.51017 0.497 X3 0.10191 0.018 X4 0.14406 0.041 14 REGRESSIONS 56 OPERATIONSThe output can be displayed in graphical form.
 
Date created: 08/12/2003 Last updated: 12/11/2023 Please email comments on this WWW page to alan.heckert@nist.gov. 