POISSON PLOT

Name:

POISSON PLOT Type:

Graphics Command Purpose:

a Poisson plot
a geometric plot
a negative binomial plot
a binomial plot
a logarithmic series plot

Description:

The following table shows how these plots are constructed where x and n_x denote the class value and the corresponding frequency. In all cases, the x-coordinate is x.

Distribution \( \phi (n_{x}^{*}) \)
Y-Axis
Coordinate Theoretical
Slope Theoretical
Intercept

Poisson \( \log \left( \frac{x!n_{x}^{*}}{N} \right) \) \( \log(\lambda) \) \( -\lambda \)

Geometric \( \log \left( \frac{n_{x}^{*}}{N} \right) \) log(1-p) log(p)

Negative Binomial \( \log ( \frac{n_{x}^{*}} {N \left( \begin{array}{c} n+x-1 \\ x \end{array} \right) } ) \) log(1-p) n log(p)

Binomial \( \log ( \frac{n_{x}^{*}} {N \left( \begin{array}{c} n \\ x \end{array} \right) } ) \) log(p/(1-p)) n log(1-p)

Logarithmic Series \( \log \left( \frac{x n_{x}^{*}}{N} \right) \) \( \log(\theta) \) \( -\log(-\log(1 - \theta)) \)

where

p	=	probability of success parameter for the geometric, binomial, and negative binomial distributions.
\( \theta \)	=	the shape parameter for the logarithmic series distribution.
n	=	the number of trials parameter for the binomial distribution.

The theoretical slope parameter can be used to estimate the shape parameter of the distribution.

Hoaglin and Tukey (see References below) provides the derivations of why these plots should be linear if the specified distribution is appropriate. They also make the following suggestions for enhancing these plots:

A 95% confidence interval for each point on the plot is given as

\( \phi (n_{x}^{*}) \pm h(x) \)

where

\( n_{x}^{*} \)	=	n_x - 0.8 n_x/N - 0.67	n_x ≥ 2
	=	1/e	n_x = 1
	=	undefined	n_x = 0
h(x)	=	\( \frac{1.96 \sqrt{1 - \hat{p_{x}}}} {\sqrt{n_{x} - (0.25 \hat{p_{x}} + 0.47) \sqrt{n_{x}}}} \)
N	=	total sample size
\( \hat{p_{x}} \)	=	\( \frac{n_{x}} {N} \)

The rationale for this confidence interval is given in the Hoaglin and Tukey reference.

The \( n_{x}^{*} \) values are referred to as the adjusted frequencies.

These plots can be "leveled". By leveling, we convert the plot from interpretation of departures from a diagonal line to departures from a horizontal line. This may be an easier visual task.
To level the plot, we plot
where intercept and slope are taken from the columns "theoretical intercept" and "theoretical slope" in the table above.
Note that a preliminary estimate of the shape parameter for the distribution is required to compute the theoretical intercept and the theoretical slope. This is discussed further in a Note section below.

Syntax 1:

This syntax is used for the case where you have raw data. Dataplot will automatically create the frequency table.

Syntax 2:

This syntax is used for the case where your data is in the form of a frequency table.

Examples:

Note:

For the Poisson distribution, the maximum likelihood estimate of \( \lambda \) is the sample mean. This is used as the prelimanary estimate of \( \lambda \) in the leveled version of the plot.
For the binomial distribution, you need to specify the n parameter (the number of trials) by entering the following command before the BINOMIAL PLOT command:
The sample mean is then used as the estimate of the p (probability of success) parameter. This is the maximum likelihood estimate.
For the geometric distribution, the maximum likelihood estimate of the p (probability of success) parameter is
where \( \bar{x} \) is the sample mean.
For the negative binomial distribution, there are two parameters: p and k. For this plot, k is restricted to integer values.
You can either specify a value for k by entering the command
LET K = <value>
or you can let Dataplot estimate the value.
If k is not specified, the moment estimate of k is used:
This estimate will be rounded to the nearest integer.
The maximum likelihood estimate of p is then
If k ≥ 2, then the bias corrected estimate is used
For the logarithmic series distribution, you can specify the desired value of theta by entering the command
You can obtain this estimate either by using maximum likelihood, the PPCC plot, or the KS plot.

Note:

trace 1	=	\( \phi(n_x) \) versus x
trace 2	=	fitted line for \( \phi(n_x) \) versus x
trace 3	=	\( \phi (n_{x}^{*}) \) versus x
trace 4	=	fitted line for \( \phi (n_{x}^{*}) \) versus x
trace 5	=	lower confidence point
trace 6	=	upper confidence point
trace 7 and above	=	line connecting the lower and upper confidence points

If you want to suppress any of these components, you can set both the CHARACTER and LINE settings to BLANK. The example programs below demonstrate the use of the LINE and CHARACTER commands to control the appearance of the plot.

Note:

SET POISSON PLOT LEVEL ON

To reset the default, enter the command

SET POISSON PLOT LEVEL OFF

This command applies to all five of the plots described here, not just the Poisson plot.

Note:

All plots:

PPA0	=	the intercept of the fitted line (unadjusted frequencies)
PPA1	=	the slope of the fitted line (unadjusted frequencies)
PPA0ADJU	=	the intercept of the fitted line (adjusted frequencies)
PPA1	=	the slope of the fitted line (adjusted frequencies)

Poisson plot:

LAMBDAPP	=	the estimate of \( \lambda \) based on the unadjusted frequencies
LAMBDAPA	=	the estimate of \( \lambda \) based on the adjusted frequencies

Binomial, negative binomial, geometric plot:

PPP	=	the estimate of p based on the unadjusted frequencies
PPPADJ	=	the estimate of p based on the adjusted frequencies

THETAPP	=	the estimate of \( \theta \) based on the unadjusted frequencies
THETAPPA	=	the estimate of \( \theta \) based on the adjusted frequencies

Default:

The unleveled version of the plot is generated by default. Synonyms:

None Related Commands:

PROBABILITY PLOT	= Generates a probability plot.
PPCC PLOT	= Generates a ppcc plot.
KS PLOT	= Generates a Kolmogorov-Smirnov (or chi-square) plot.
ORD PLOT	= Generate an Ord plot.
HISTOGRAM	= Generates a histogram.
LINES	= Sets the type for plot lines.
CHARACTER	= Sets the type for plot characters.

References:

The American Statistician

Hoaglin and Tukey (1985), "Checking The Shape of Discrete Distributions". In Hoaglin, Mosteller, and Tukey, editors, "Exploring Data Tables, Trends, and Shapes", chapter 9, John Wiley and Sons, New York.

Friendly (2000), "Visualizing Categorical Data", SAS Publishing, Cary, NC, pp. 49-56.

Applications:

Distributional Modeling Implementation Date:

2007/5 Program:

 
.  Following data from p. 51 of Friendly
read x y
0  109
1   65
2   22
3    3
4    1
end of data
.
title case asis
title offset 2
label case asis
x1label displacement 6
title Poisson Plot
x1label X
y1label LOG(x!*n(x)/N)
x3label
.
char blank all
char fill off all
char hw 1.5 1.2 all
char color black all
char circle blank circle blank circle circle
char fill on off on
char color blue black green
line dotted all
line blank solid blank solid blank blank
line color black blue black green
tic offset units screen
tic offset 3 3
.
poisson plot y x
.
let lambml = weighted mean x y
justification center
move 50 8
text Unadjusted: Intercept = ^ppa0, Slope = ^ppa1
move 50 5
text Adjusted: Intercept = ^ppa0adju, Slope = ^ppa1adju
move 50 2
text Lambda: ML = ^lambml, PP = ^lambdapp, PPadj = ^lambdapa