Dataplot Vol 2 Vol 1

# AGRESTI COULL CONFIDENCE LIMITS

Name:
AGRESTI COULL CONFIDENCE LIMITS (LET)
Type:
Let Subcommand
Purpose:
Compute the two-sided Agresti-Coull confidence limits for a binomial proportion.
Description:
The binomial proportion is defined as the number of successes divided by the number of trials.

Confidence intervals for the binomial proportion can be computed using a method recommended by Agresti and Coull and also by Brown, Cai and DasGupta (the methodology was originally developed by Wilson in 1927). This method solves for the two values of p0 (say, pupper and plower)) that result from setting z = α/2 and solving for p0 = pupper, and then setting z = -z = α/2 and solving for p0 = plower where zα/2 denotes the variate value from the standard normal distribution such that the area to the right of the value is α/2. The solution for the two values of p0 results in the following confidence intervals:

$$U. L. = \frac{\hat{p} + \frac{z_{\alpha/2}^{2}}{2n} + z_{\alpha/2}\sqrt{\frac{\hat{p}(1-\hat{p})}{n} + \frac{z_{\alpha/2}^{2}}{4n^2}}} {1 + z_{\alpha/2}^{2}/n}$$

$$L. L. = \frac{\hat{p} + \frac{z_{\alpha/2}^{2}}{2n} - z_{\alpha/2}\sqrt{\frac{\hat{p}(1-\hat{p})}{n} + \frac{z_{\alpha/2}^{2}}{4n^2}}} {1 + z_{\alpha/2}^{2}/n}$$

This approach can be substantiated on the grounds that it is the exact algebraic counterpart to the (large-sample) hypothesis test and is also supported by the research of Agresti and Coull. One advantage of this procedure is that its worth does not strongly depend upon the value of n and/or p, and indeed was recommended by Agresti and Coull for virtually all combinations of n and p.

Another advantage is that the limits are in the (0,1) interval. This is not true for the frequently used normal approximation:

$$\hat{p} \pm z_{\alpha/2}\sqrt{\frac{\hat{p}(1 - \hat{p})}{n}}$$
Syntax:
LET <lowlim> <upplim> = AGRESTI COULL CONFIDENCE LIMITS <p> <n> <alpha>
<SUBSET/EXCEPT/FOR qualification>
where <p> is constant, parameter, or variable that contains the proportion of successes;
<n> is constant, parameter, or variable that contains the number of trials;
<alpha> is constant or parameter that contains the significance level;
<lowlim> is a variable that contains the computed lower Agresti-Coull confidence limit;
<upplim> is a variable that contains the computed upper Agresti-Coull confidence limit;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

The <p> and <n> arguements can be either parameters or variables. If they are both variables, then the variables must have the same number of elements. The <alpha> argument is alwasys assumed to be either a constant or a parameter.

If <p> and <n> are both parameters, then <lowlim> and <upplim> will be parameters. Otherwise, they will be variables.

Examples:
LET AL AU = AGRESTI COULL CONFIDENCE LIMITS P N ALPHA
LET AL AU = AGRESTI COULL CONFIDENCE LIMITS P N ALPHA ...
SUBSET TAG > 2
Note:
There are many methods that have proposed for the confidence limits for a binomial proportion. In the statistical literature, what we refer to above as the Agresti-Coull method is now commonly referred to as the Wilson method (this method was originally described in a paper by Wilson). What the Agresti-Coull paper referred to as the adjusted Wald method is now commonly referred to as the Agresti-Coull method.

The Brown, Cai, and DasGupta paper studied the coverage properties of various methods. They specifically recommend the Wilson, the adjusted Wald, and a Bayesion method based on a Jeffreys prior as having the best coverage properties. Specifically, they recommend the Wilson and Jeffreys methods for n ≤ 40. For n > 40, the methods have comparable performance. Although they recommend the adjusted Wald in this case, this is primarily for simplicity in classroom presentation.

In any event, the March, 2014 version of Dataplot added the following command:

Whenever an Agresti-Coull interval is invoked in Dataplot, this command specifies which interval will be computed.

$$\tilde{p} \pm \Phi^{-1}_{(1 - \alpha/2)} \sqrt{\tilde{p}(1 - \tilde{p})}/\sqrt{\tilde{n}}$$

where

$$\tilde{X} = X + (\Phi^{-1}(1 - \alpha/2))^{2}/2 \hspace{0.5in}$$ (X is the number of success)
$$\tilde{n} = n + (\Phi^{-1}(1 - \alpha/2))^{2}$$
$$\tilde{p} = \frac{\tilde{X}} {\tilde{n}}$$
$$\Phi^{-1}$$ is the percent point function of the normal distribution

Note that the adjusted Wald method is never shorter than the Wilson interval.

The Jeffreys interval (the derivation for this interval is given in the Brown, Cai, DasGupta paper) is

LCL = BETPPF(α/2,X + 0.5)
UCL = BETPPF(1 - α/2,n - X + 0.5)

where BETPPF is the percent point function of the beta distribution and X is the number of successes.

The default method is the Wilson interval.

Note:
If you would like to use this command on raw data (i.e., you have a variable containing a sequence of 0's and 1's), do something like the following

LET YSUM = SUM Y
LET NTRIAL = SIZE Y
LET P = YSUM/NTRIAL
LET AL AU = AGRESTI COULL CONFIDENCE LIMITS P NTRIAL ALPHA

If you have a group-id variable (X), you would do something like

SET LET CROSS TABULATE COLLAPSE
LET YSUM = CROSS TABULATE SUM Y X
LET NTRIAL = CROSS TABULATE SIZE Y X
LET P = YSUM/NTRIAL
LET AL AU = AGRESTI COULL CONFIDENCE LIMITS P NTRIAL ALPHA

In this case, P and NTRIAL are now variables rather than parameters.

Note:
The following commands are also available:

LET A = TWO SIDED LOWER AGRESTI COULL Y
LET A = TWO SIDED UPPER AGRESTI COULL Y
LET A = ONE SIDED LOWER AGRESTI COULL Y
LET A = ONE SIDED UPPER AGRESTI COULL Y

This command is a Statistics Let Subcommand rather than a Math LET Subcommand. The distinctions are:

1. The "Statistics" version of the command returns a single parameter value while the "Math" version of the command returns two variables.

2. The "Statistics" version of the command can be used with a number of other commands (see the Note above) while the "Math" version of the command cannot.

For example, the "Statistics" version of the command is most typically used with the FLUCTUATION PLOT, CROSS TABULATE, and STATISTIC PLOT commands.

3. The "Statistics" version of the command expects a single variable (containing a sequence of 1's and 0's). The "Math" version expects summary data (i.e., P and N). The P and N can be either constants, parameters, or variables (or even a mix of these).

Which form of the command to use is determined by the context of what you are trying to do.

For details on the "Statistics" version of the command, enter

Default:
None
Synonyms:
AGRESTI COULL CONFIDENCE is a synonym for AGRESTI COULL CONFIDENCE LIMITS
Related Commands:
 AGRESTI-COULL = Compute Agresti-Coull confidence limits statistic for binomial proportions. EXACT BINOMIAL = Compute the "exact" confidence limits statistic for binomial proportions. BINOMIAL PROPORTION = Compute the binomial proportion statistic. BINOMIAL PROPORTION TEST = Perform a binomial proportions test. CROSS TABULATE = Perform a cross-tabulation for a specified statistic.
References:
Agresti, A. and Coull, B. A. (1998), "Approximate is better than "exact" for interval estimation of binomial proportions", The American Statistician, 52(2), 119-126.

Brown, L. D. Cai, T. T. and DasGupta, A. (2001), "Interval estimation for a binomial proportion," Statistical Science, 16(2), 101-133.

Wilson (1927), "Probable inference, the law of succession, and statistical inference," Journal of the American Statistical Association, Vol. 22, pp. 209-212.

Applications:
Statistics
Implementation Date:
2007/2
2014/3: Support for SET BINOMIAL METHOD command
Program:

LET N = 25
LET P = 0.8
LET ALPHA = 0.95
LET AL AU = AGRESTI-COULL CONFIDENCE LIMITS P N ALPHA

The returned value of AL and AU are 0.6086905 and 0.9113942.

NIST is an agency of the U.S. Commerce Department.

Date created: 10/05/2010
Last updated: 03/06/2014