Dataplot Vol 1 Vol 2

# PORPORTION CONFIDENCE LIMITS

Name:
PROPORTION CONFIDENCE LIMITS
Type:
Analysis Command
Purpose:
Generates a confidence interval for proportions.
Description:
Given a set of N observations in a variable X, we can compute the proportion of successes. The PROPORTION CONFIDENCE LIMITS command computes a confidence interval for the proportion of successes.

In Dataplot, you define a success by entering the command

ANOP LIMITS <lower limit> <upper limit>

before entering the PROPORTION CONFIDENCE LIMITS command. That is, you specify the lower and upper values that define a success. Then the estimate for the proportion of successes is simply the number of points in the success region divided by the total number of points. In most applications, successes are defined by 1's and failures by 0's. The default limits are 0.5 and 1.5, so if your data is defined by 0 and 1 values the ANOP LIMITS command can be omitted.

Several methods have been proposed for the confidence limits for a binomial proportion. The following methods are currently supported in Dataplot

1. NORMAL APPROXIMATION

The normal approximation interval is

$$\hat{p} \pm \Phi^{-1}_{(1 - \alpha/2)} \sqrt{\hat{p}(1 - \hat{p}/n}$$

where

X = the number of successes
$$\hat{p} = \frac{X} {n}$$
$$\Phi^{-1}$$ is the percent point function of the normal distribution

Due to its simplicity, the method is commonly used. However, its nominal coverage properties are not as good as the other methods. Its use should be restricted to cases with relatively large sample sizes where $$\hat{p}$$ is not near 0 or 1.

$$\tilde{p} \pm \Phi^{-1}_{(1 - \alpha/2)} \sqrt{\tilde{p}(1 - \tilde{p})/\tilde{n}}$$

where

X = is the number of success
$$\tilde{X} = X + (\Phi^{-1}(1 - \alpha/2))^{2}/2 \hspace{0.5in}$$
$$\tilde{n} = n + (\Phi^{-1}(1 - \alpha/2))^{2}$$
$$\tilde{p} = \frac{\tilde{X}} {\tilde{n}}$$
$$\Phi^{-1}$$ is the percent point function of the normal distribution

This method improves upon the normal approximation.

3. WILSON

This method was originally proposed by Wilson in 1927. Papers by Agresti and Coull and also by Brown, Cai and DasGupta recommended this interval and provided comparisons of this method to the adjusted Wald and other methods.

This method solves for the two values of p0 (say, pupper and plower)) that result from setting z = α/2 and solving for p0 = pupper, and then setting z = -z = α/2 and solving for p0 = plower where zα/2 denotes the variate value from the standard normal distribution such that the area to the right of the value is α/2. The solution for the two values of p0 results in the following confidence intervals:

$$U. L. = \frac{\hat{p} + \frac{z_{\alpha/2}^{2}}{2n} + z_{\alpha/2}\sqrt{\frac{\hat{p}(1-\hat{p})}{n} + \frac{z_{\alpha/2}^{2}}{4n^2}}} {1 + z_{\alpha/2}^{2}/n}$$

$$L. L. = \frac{\hat{p} + \frac{z_{\alpha/2}^{2}}{2n} - z_{\alpha/2}\sqrt{\frac{\hat{p}(1-\hat{p})}{n} + \frac{z_{\alpha/2}^{2}}{4n^2}}} {1 + z_{\alpha/2}^{2}/n}$$

This approach can be justified on the grounds that it is the exact algebraic counterpart to the (large-sample) hypothesis test and is also supported by the research of Agresti and Coull. One advantage of this procedure is that its worth does not strongly depend upon the value of n and/or p, and indeed was recommended by Agresti and Coull for virtually all combinations of n and p. Simulations by Agresti and Coull and by Brown, Cai and DasGupta show that this method does a better job of maintaining the nomial coverage than does the adjusted Wald and normal approximation methods. Another advantage is that the limits are in the (0,1) interval.

4. JEFFREYS

The Jeffreys interval is a Bayesian method based on a Jeffreys prior (the derivation for this interval is given in the Brown, Cai, DasGupta paper) is

LCL = BETPPF(α/2,X + 0.5)
UCL = BETPPF(1 - α/2,n - X + 0.5)

where BETPPF is the percent point function of the beta distribution and X is the number of successes.

5. EXACT BINOMIAL (or CLOPPER-PEARSON)

Solve the equation

for pu to obtain the upper 100(1 - )% limit for p where BINCDF is the cumulative distribution function of the binomial distribution, x is the number of successes, and n is the number of trials.

Next solve the equation

for pl to obtain the lower 100(1 - )% limit for p.

Although this method is called "exact", it is not more accurate than the adjusted Wald or Wilson method. The "exact" terminology is based on the use of the binomial CDF function. However, since the binomial is a discrete distribution, the use of the CDF function does not result in "exact" 95% confidence intervals. The Agresti and Coull paper gives arguments to justify why the "approximate" Wilson and adjusted Wald methods can often be more accurate than the "exact" method.

To specify the method to use, enter the command

NORMAL/EXACT>

The default is the Wilson method. The Brown, Cai, and DasGupta paper studied the coverage properties of various methods. They specifically recommend the Wilson, the adjusted Wald, and the Jeffreys method as having the best coverage properties. Specifically, they recommend the Wilson and Jeffreys methods for n ≤ 40. For n > 40, these three methods have comparable performance. Although the normal approximation and exact binomial methods are not typically recommended, Dataplot provides them since they are still used in practice.

Syntax:
PROPORTION CONFIDENCE LIMITS <y>
<SUBSET/EXCEPT/FOR qualification>
where <y> is the response variable;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.
Examples:
ANOP LIMITS 0.80 1.0
PROPORTION CONFIDENCE LIMITS Y

ANOP LIMITS 0.80 1.0
PROPORTION CONFIDENCE LIMITS Y SUBSET TAG = 1 TO 3

Note:
A table of confidence intervals is printed for alpha levels of 50.0, 75.0, 90.0, 95.0, 99.0, 99.9, 99.99, and 99.999. The sample size, sample number of successes, and sample proportion of successes are also printed.
Note:
Prior versions of Dataplot used the following method for the confidence interval

(BINPPF(ALPHA/2,P,N)/N, BINPPF(1-ALPHA/2,P,N)/N)

with BINPPF denoting the percent point function of the binomial distribution.

Default:
None
Synonyms:
None
Related Commands:
 AGRESTI COULL = Compute either the lower or upper confidence limit for either a one-sided or a two-sided binomial proportion of a variable (Wilson, adjusted Wald, or Jeffreys method). EXACT BINOMIAL = Compute either the lower or upper exact binomial confidence limit for either a one-sided or a two-sided binomial proportion of a variable. DIFFERENCE OF PROPORTIONS CONFIDENCE LIMIT = Generate a confidence interval for the difference of proportions. ANOP LIMITS = Specify success region for proportions. ANOP PLOT = Generate an analysis of proportions plot. CONFIDENCE LIMITS = Generate the confidence limits for the mean.
Reference:
Agresti, A. and Coull, B. A. (1998), "Approximate is better than "exact" for interval estimation of binomial proportions", The American Statistician, 52(2), 119-126.

Brown, L. D. Cai, T. T. and DasGupta, A. (2001), "Interval estimation for a binomial proportion," Statistical Science, 16(2), 101-133.

Wilson (1927), "Probable inference, the law of succession, and statistical inference," Journal of the American Statistical Association, Vol. 22, pp. 209-212. Snedecor and Cochran, 1989, "Statistical Methods," Eigth Edition, Iowa State University Press, pp. 121-124.

Applications:
Confirmatory Data Analysis
Implementation Date:
1999/05
2017/11: Change method for determining the confidence interval
Program:
.           Create a binary variable with 30 rows
.           with 8 successes.
.
let n = 30
let nsuc = 8
let y = 0 for i = 1 1 n
let y = 1 for i = 1 1 nsuc
.
.          Now do proportions confidence interval
.
set write decimals 6
set binomial method wilson
proportion confidence interval y
proportion confidence interval y
set binomial method jeffreys
proportion confidence interval y
set binomial method exact
proportion confidence interval y
set binomial method normal
proportion confidence interval y

This command generated the following output:
            Two-Sided Confidence Limits for a Proportion
(Wilson Method)

Response Variable: Y

Sample:
Number of Observations:                  30
Number of Successes:                     8
Proportion of Successes:                 0.266667
Standard Error:                          0.080737

------------------------------------------
Confidence          Lower          Upper
Value (%)          Limit          Limit
------------------------------------------
50.000       0.215992       0.324313
75.000       0.185098       0.367950
90.000       0.157323       0.414615
95.000       0.141827       0.444480
99.000       0.116046       0.501805
99.900       0.092558       0.564537
99.990       0.077142       0.612690
99.999       0.066181       0.651056

Two-Sided Confidence Limits for a Proportion

Response Variable: Y

Sample:
Number of Observations:                  30
Number of Successes:                     8
Proportion of Successes:                 0.266667
Standard Error:                          0.080737

------------------------------------------
Confidence          Lower          Upper
Value (%)          Limit          Limit
------------------------------------------
50.000       0.217582       0.326788
75.000       0.188962       0.376021
90.000       0.163909       0.432706
95.000       0.150238       0.471347
99.000       0.128339       0.551032
99.900       0.110551       0.646995
99.990       0.101710       0.727092
99.999       0.098343       0.794898

Two-Sided Confidence Limits for a Proportion
(Bayesian with Jeffreys Prior Method)

Response Variable: Y

Sample:
Number of Observations:                  30
Number of Successes:                     8
Proportion of Successes:                 0.266667
Standard Error:                          0.080737

------------------------------------------
Confidence          Lower          Upper
Value (%)          Limit          Limit
------------------------------------------
50.000       0.217637       0.325518
75.000       0.184464       0.367317
90.000       0.153145       0.412052
95.000       0.134941       0.440996
99.000       0.103386       0.497902
99.900       0.073387       0.563271
99.990       0.053383       0.616478
99.999       0.039406       0.661171

Two-Sided Confidence Limits for a Proportion
(Exact Binomial Method)

Response Variable: Y

Sample:
Number of Observations:                  30
Number of Successes:                     8
Proportion of Successes:                 0.266667
Standard Error:                          0.080737

------------------------------------------
Confidence          Lower          Upper
Value (%)          Limit          Limit
------------------------------------------
50.000       0.202418       0.342833
75.000       0.170298       0.385007
90.000       0.140185       0.429934
95.000       0.122795       0.458894
99.000       0.092892       0.515598
99.900       0.064818       0.580375
99.990       0.046392       0.632814
99.999       0.033699       0.676670

Two-Sided Confidence Limits for a Proportion
(Normal Approximation Method)

Response Variable: Y

Sample:
Number of Observations:                  30
Number of Successes:                     8
Proportion of Successes:                 0.266667
Standard Error:                          0.080737

------------------------------------------
Confidence          Lower          Upper
Value (%)          Limit          Limit
------------------------------------------
50.000       0.212210       0.321123
75.000       0.173791       0.359543
90.000       0.133866       0.399468
95.000       0.108424       0.424909
99.000       0.058701       0.474632
99.900       0.000998       0.532335
99.990       0.000000       0.580783
99.999       0.000000       0.623298


NIST is an agency of the U.S. Commerce Department.

Date created: 06/05/2001
Last updated: 11/21/2017