MAXIMUM LIKELIHOOD
Name:
Type:
Purpose:
Compute the maximum likelihood estimates for the parameters
of a statistical distribution.
Description:
There are a number of approaches to estimating the parameters
of a statistical distribution from a set of data.
Maximum likelihood estimates are popular because they have
good statistical properties. The primary drawback is that
likelihood equations have to be derived for each specific
distributions (other approaches, such as least squares or
PPCC plots, allow a more general approach). In some cases,
the maximum likelihood estimates are trivial while in other
cases they are quite complex and may require specialized
methods to solve.
Dataplot currently supports maximum likelihood estimates for
the following continuous distributions:
- normal
- 2-parameter lognormal
- exponential
- 2-parameter Weibull
- 2-parameter inverted Weibull
- 2-parameter gamma
- Gumbel (extreme value type 1)
- Frechet (extreme value type 2, maximum case only)
- beta
- Pareto
- Rayleigh
- logistic
- Cauchy
- double exponential
- inverse gaussian
- power
- uniform
- Johnson SB (method of moments, percentile)
- Johnson SU (method of moments, percentile)
- fatigue life
- geometric extreme exponential
- folded normal
- Generalized Pareto
- Asymmetric Double Exponential
- Maxwell
- mixture or normal distributions (number of
components assummed known)
Dataplot currently supports maximum likelihood estimates for
the following discrete distributions:
- binomial
- Poisson
- Logarithmic series
- geometric
- beta binomial
- negative binomial
- hypergeometric
- Hermite
- Yule
Additional distributions may be added in the future.
We do not give the likelihood equations for the various
distributions here. Most of them can be found in the sources
listed in the Reference section.
For a given distribution, the maximum likelihood command
will generate one or more of the following outputs:
- Some summary statistics for the data and point estimates
for the parameters of the distribution.
This is the minimum output and is supported for
all distributions.
- Confidence intervals for the parameters of the
distribution. This is supported for the following
16 distributions:
normal, 2-parameter lognormal, exponential,
2-parameter Weibull, 2-parameter gamma,
Gumbel (extreme value type 1), beta, Pareto,
Rayleigh, logistic, Cauchy, 2-parameter Frechet,
2-parameter Weibull,
binomial, geometric, Poisson
Confidence intervals are obtained in the following ways:
- In some cases, the sampling distribution, or an
an approximation to the sampling distribution, may
be known for the given parameter. In these cases,
an explicit formula for the confidence interval
or a numerically tabulated value can be used.
- If the standard error for the parameter can be
determined, the large sample asymptotic normal
approximation can be obtained as
point estimate +/- NORPPF(alpha/2)*STDERR
with STDERR, NORPPF, and alpha denoting the
standard error of the parameter estimate, the
percent point function of the standard normal
distribution, and the desired significance level,
respectively.
- For a few distributions, likelihood ratio methods
are used to determine a confidence interval. These
can be more accurate than the normal approximations,
particularly for small samples where the asymptotic
normality may not be as accurate.
- Confidence intervals for selected percentiles of
distribution. The command
SET MAXIMUM LIKELIHOOD PERCENTILE
<NONE/DEFAULT/VARNAME>
where NONE means no percentile confidence limits
will be generated, DEFAULT generates percentile
confidence limits for a default set of percentiles,
and VARNAME specifies the name of a variable that
contains the percentile values where the confidence
limits will be generated.
This option is supported for the following seven
distributions:
normal, 2-parameter lognormal,
exponential, 2-parameter Weibull,
gamma, beta, gumbel (maximum case only)
Note that point estimates for selected percentiles
can be generated by simply using the point estimates
for the distribution parameters in the percent point
function of the distribution.
- Data is sometimes censored. With censored data, we
are typically interested in modeling failure or
lifetime data. In censored data, we typically have
r failure times and n-r censoring (or
survival) times (a censoring time means the unit had not
failed at the time the test was terminated).
There are several types of censoring:
- A test is terminated at a given time. This is
referred to as time censored data. Singly
censored data means all censoring times are equal
(i.e., all units were started at the same time).
Multiply censored data means that censoring times
are not necessarily the same (i.e., units may have
different start times, this is common with data
collected from the field rather than in a lab).
Time censored data is also called type 1
censored data.
This is the most common type of censoring.
- Alternatively, a test can be run until a
pre-selected number of failures have occurred.
Again, you can have singly or multiply censored
data.
Number of failures censored data is also called
type 2 censored data.
- In some cases, the number of units, n, may
not be known in advance (and may in fact be the
quantity that we are trying to estimate). In this
case, we observe the number of failures in a given
time (i.e., we know r but not n). We
typically need to estimate the parameters of the
distribution and the value of n.
At this time, Dataplot does not support maximum
likelihood estimation for truncated data. However,
we anticipate adding this support for a few select
distributions in a future release.
Censored data is supported for the following five
distributions:
- normal - multiply time censored data, estimates for
selected percentiles supported
- 2-parameter lognormal - singly time censored data,
estimates for selected percentiles supported
- exponential - both multiply time censored data and
singly number of failures censored data, estimates
for selected percentiles supported
- 2-parameter Weibull - multiply time censored data,
estimates for selected percentiles supported
- 2-parameter gamma - multiply time censored data,
estimates for selected percentiles supported
- 2-parameter inverted Weibull - multiply time
censored data, point estimates only
For the exponential distribution, you can enter the
following command to specify what type of censoring was
used:
The other distributions assume type 1 censoring.
- Data is sometimes only available in binned format.
Maximum likelihood estimates for grouped data are
supported for the following distributions:
exponential
normal mixture
A number of these commands generate method of moment estimates
or other quantitative parameter estimates in addition to the
maximum likelihood estimates. The Johnson SB and Johnson SU
case only generates method of moment or percentile estimates.
Syntax 1:
Syntax 2:
<DIST> MOMENTS <y>
<SUBSET/EXCEPT/FOR qualification>
where <DIST> is JOHNSON SB, JOHNSON SU, or UNIFORM;
<y> is the response variable;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.
Syntax 3:
<DIST> PERCENTLE <y>
<SUBSET/EXCEPT/FOR qualification>
where <DIST> is JOHNSON SB, JOHNSON SU, or JOHNSON;
<y> is the response variable;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.
Syntax 4:
<DIST> MAXIMUM LIKELIHOOD <y> <x>
<SUBSET/EXCEPT/FOR qualification>
where <DIST> is one of:
NORMAL, LOGNORMAL, EXPONENTIAL, WEIBULL,
GAMMA, or INVERTED WEIBULL;
<y> is the response variable;
<x> is the censoring variable;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.
The censoring variable should contain 1's and 0's where 1
indicates a failure time and 0 indicates a censoring time.
Syntax 5:
<DIST> GROUPED MAXIMUM LIKELIHOOD <y> <x>
<SUBSET/EXCEPT/FOR qualification>
where <DIST> is EXPONENTIAL;
<y> is the frequency variable;
<x> is the bin mid-points variable;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.
This syntax is used for grouped data. In this case, the
keyword GROUPED is required to distinguish this from the
censored data case.
Syntax 6:
<DIST> MAXIMUM LIKELIHOOD <y> <x>
<SUBSET/EXCEPT/FOR qualification>
where <DIST> is NORMAL MIXTURE;
<y> is the frequency variable;
<x> is the bin mid-points variable;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.
In this case, the keyword GROUPED is omitted since censored
data is not supported for these distributions.
Examples:
NORMAL MAXIMUM LIKELIHOOD Y
EXPONENTIAL MAXIMUM LIKELIHOOD Y
WEIBULL MAXIMUM LIKELIHOOD Y X
WEIBULL MAXIMUM LIKELIHOOD Y SUBSET X > 5
JOHNSON SB MOMENTS Y
Note:
The estimated parameters will typically be saved as internal
parameters that can be used in subsequent analysis. The feedback
message will specify what parameters have been saved (or you
can enter STATUS PARAMETERS to see what parameters were saved).
Note:
By default, the Gumbel case performs the maximum likelihood
estimation for the maximum order statistic case. To obtain
the maximum likelihood estimates for the minimum order case,
enter the following command:
Note:
Note:
The Johnson SB and Johnson SU moment estimators are computed
using Applied Statistics algorithm 99.
These distributions can also be estimated using the percentile
method described by Slifker and Shapiro (see the Reference
section below). This method is based on matching percentiles
of the data with theoretical percentiles. This method first
determines whether the Johnson SB or Johnson SU distribution
is most appropriate. This method requires a tuning parameter
that can be set with the following command:
The default value is 0.54. As the sample size gets larger,
the value of Z can be set closer to 1. Basically, increasing
the value of Z will use more extreme percentiles in performing
the estimation.
Skifler and Shapiro do not give specific recommendations.
However, using the default value for small to moderate size
data sets (say a few hundred points or less) and a value of
0.8 for data sets larger than this should generate reasonable
results. Alternatively, you can generate the estimates using
several different values of Z between 0.5 and 1. You can
perform a Kolmogorov-Smirnov goodness of fit test with the
different estimates to see what value of Z results in the
best fit.
Note:
For the negative binomial distribution, a maximum likelihood
estimate for P is returned assuming K is known. To
specify the value of K, enter the command
For the hypergeometric distribution, there are four quantities
of interest:
- N = total number of items in population
- n = number of items sampled
- K = number of defective items (or successes)
in population
- x = number of defectives in sample
There are two distinct cases to consider.
- Given that N (the population size) is known, we
want to estimate the number of defectives in the population
given a sample of size n with x defectives.
An example is acceptance sampling where the lot size is
known and a subsample is choosen for inspection. In this
case, the maximum likelihood estimate of K is:
K = MAX INTEGER ≤ x*(N+1)/n
- In capture/recapture problems, a sample is taken and
marked. That is, K is known. Then a second sample
(of size n) is taken and the number of marked items
(x) are counted. In this case, the maximum
likelihood estimates are:
We implement the refinement of Chapman (see page 263 of
Johnson, Kotz, and Kemp):
N* = (n+1)*(K+1)/(x+1) - 1
Formulas for the variance are also given in Johnson, Kotz,
and Kemp.
Note:
For the details of maximum likelihood estimates for the Yule
and Hermite distributions, enter the commands
Default:
Synonyms:
MLE is a synonymn for MAXIMUM LIKELIHOOD
Related Commands:
|
FIT
|
= Perform a least squares fit.
|
|
PPCC PLOT
|
= Generate a ppcc plot.
|
|
KS PLOT
|
= Generate a Kolmogorov-Smirnov plot.
|
|
PROBABILITY PLOT
|
= Generate a probability plot.
|
|
KOLMOGOROV SMIRNOV GOODNESS OF FIT TEST
|
= Perform a Kolmogorov Smirnov goodness of fit test.
|
|
WILK SHAPIRO TEST
|
= Perform a Wilks-Shapiro test for normality.
|
Reference:
"Continuous Univariate Distributions: Volume I", 2nd. ed.,
Johnson, Kotz, and Balakrishnan, John Wiley and Sons, 1994.
"Continuous Univariate Distributions: Volume II", 2nd. ed.,
Johnson, Kotz, and Balakrishnan, John Wiley and Sons, 1994.
"Univariate Discrete Distributions", 2nd. ed.,
Johnson, Kotz, and Kemp, John Wiley and Sons, 1994.
"Statistical Distributions in Engineering", Karl Bury,
Cambridge University Press, 1999.
"Statistical Distributions", Third Edition, Evans, Hastings, and
Peacock, 2000.
"Algorithm AS 99", Applied Statistics, 1976, Vol. 25, P. 180.
"Confidence Intervals for the Parameters of the Logistic
Distribution", Charles Antle, Lawrence Klimko, and William
Harkness, Biometriks, (1970), pp. 397-402.
"The Johnson System: Selection and Parameter Estimation",
James F. Slifker and Samuel S. Shapiro, Technometrics,
Vol. 22, No. 2, May 1980, pp. 239-246.
"Inferences for the Cauchy Distribution Based on Maximum
Likelihood Estimators", Biometrika, 1970, pp. 403-407.
Applications:
Reliability, Data Analysis, Distributional Modeling
Implementation Date:
1998/5
2003/10: Gumbel case supports both minimum and maximum cases
2003/11: Added support for logistic, uniform, and beta
distributions
2004/5: Added confidence limits (Agresti and Coull approach)
for binomial case
2004/5: Added confidence limits for lognormal case
2004/5: Added support for the following continuous
distributions
FATIGUE LIFE
GEOMETRIC EXTREME EXPONENTIAL
FOLDED NORMAL
CAUCHY
2004/5: Added support for the following discrete distributions
LOGARITHMIC SERIES
GEOMETRIC
BETA BINOMIAL
NEGATIVE BINOMIAL
HYPERGEOMETRIC
HERMITE
YULE
2004/5: Added the JOHNSON PERCENTILE case
2004/6: Added the ASYMETRIC DOUBLE EXPONENTIAL, RAYLEIGH, and
MAXWELL cases
2004/12: Rewrote the maximum likeihood output for the normal,
lognormal, exponential, Weibull, gamma, Gumbel, Beta,
and Pareto distributions. Added support for confidence
intervals for selected percentiles for 7 distributions
and support for censored data for 5 distributions.
Program:
skip 25
read vangel31.dat y
exponential mle y
weibull mle y
lognormal mle y
gamma mle y
The following output is generated.
*************************
** exponential mle y **
*************************
EXPONENTIAL MAXIMUM LIKELIHOOD ESTIMATION: FULL SAMPLE CASE
ONE-PARAMETER MODEL (LOCATION = 0)
NUMBER OF OBSERVATIONS = 38
MINIMUM VALUE = 147.0000
ML ESTIMATE OF SCALE PARAMETER = 185.7895
STANDARD ERROR OF SCALE PARAMETER = 30.13903
CONFIDENCE INTERVAL FOR SCALE PARAMETER
CONFIDENCE LOWER UPPER
VALUE (%) LIMIT LIMIT
-------------------------------------------
50.000 168.269 209.615
75.000 156.300 227.404
90.000 145.042 248.068
95.000 138.432 262.541
99.000 126.642 294.188
99.900 114.602 337.472
THE MINIMUM VALUE WILL BE SAVED AS THE INTERNAL PARAMETER U1
THE SCALE PARAMETER WILL BE SAVED AS THE INTERNAL PARAMETER B1
TWO-PARAMETER MODEL (LOCATION UNKNOWN)
NUMBER OF OBSERVATIONS = 38
ESTIMATE OF LOCATION PARAMETER = 147.0000
STANDARD ERROR OF LOCATION PARAMETER = 1.034478
BIAS CORRECTED ESTIMATE OF LOCATION PARAMETER = 145.9516
STANDARD ERROR OF BIAS CORRECTED LOCATION PARAMETER = 1.062437
ESTIMATE OF SCALE PARAMETER = 38.78947
STANDARD ERROR OF SCALE PARAMETER = 6.376950
BIAS CORRECTED ESTIMATE OF SCALE PARAMETER = 39.83784
STANDARD ERROR OF BIAS CORRECTED SCALE PARAMETER = 6.549300
CONFIDENCE INTERVAL FOR LOCATION PARAMETER
CONFIDENCE LOWER UPPER
VALUE (%) LIMIT LIMIT
-------------------------------------------
50.000 145.519 146.697
75.000 144.758 146.860
90.000 143.729 146.946
95.000 142.933 146.973
99.000 141.028 146.995
99.900 138.154 146.999
CONFIDENCE INTERVAL FOR SCALE PARAMETER
CONFIDENCE LOWER UPPER
VALUE (%) LIMIT LIMIT
-------------------------------------------
50.000 36.0380 45.0267
75.000 33.4428 48.9045
90.000 31.0050 53.4162
95.000 29.5751 56.5804
99.000 27.0274 63.5113
99.900 24.4298 73.0145
THE LOCATION PARAMETER WILL BE SAVED AS THE INTERNAL PARAMETER U2
THE SCALE PARAMETER WILL BE SAVED AS THE INTERNAL PARAMETER B2
*********************
** weibull mle y **
*********************
WEIBULL MAXIMUM LIKELIHOOD ESTIMATION: FULL SAMPLE CASE
TWO-PARAMETER MODEL (LOCATION = 0)
NUMBER OF OBSERVATIONS = 38
MINIMUM VALUE = 147.0000
SAMPLE MEAN VALUE = 185.7895
SAMPLE STANDARD DEVIATION VALUE = 18.59549
ESTIMATE OF SCALE PARAMETER = 194.2046
STANDARD ERROR OF SCALE PARAMETER = 3.137330
ESTIMATE OF SHAPE PARAMETER = 10.57322
STANDARD ERROR OF SHAPE PARAMETER = 1.337343
BIAS CORRECTED ESTIMATE OF SHAPE PARAMETER = 10.20502
STANDARD ERROR OF BIAS CORRECTED SHAPE PARAMETER = 1.290772
STANDARD ERROR OF SHAPE/SCALE COVARIANCE = 1.146094
STD ERR OF BIAS CORRECTED SHAPE/SCALE COVARIANCE = 1.125962
CONFIDENCE INTERVAL FOR SCALE PARAMETER
NORMAL APPROXIMATION LIKELIHOOD RATIO
CONFIDENCE LOWER UPPER LOWER UPPER
VALUE (%) LIMIT LIMIT LIMIT LIMIT
-----------------------------------------------------------------------
50.000 192.089 196.321 192.063 196.335
75.000 190.596 197.814 190.529 197.852
90.000 189.044 199.365 188.901 199.457
95.000 188.056 200.354 187.840 200.504
99.000 186.123 202.286 185.704 202.633
99.900 183.881 204.528 183.090 205.301
CONFIDENCE INTERVAL FOR SHAPE PARAMETER
(BASED ON NO BIAS CORRECTION ESTIMATES)
NORMAL APPROXIMATION LIKELIHOOD RATIO
CONFIDENCE LOWER UPPER LOWER UPPER
VALUE (%) LIMIT LIMIT LIMIT LIMIT
-----------------------------------------------------------------------
50.000 9.67119 11.4752 9.73924 11.4358
75.000 9.03480 12.1116 9.16853 12.0612
90.000 8.37348 12.7729 8.59130 12.7256
95.000 7.95207 13.1944 8.23208 13.1567
99.000 7.12845 14.0180 7.54982 14.0162
99.900 6.17267 14.9738 6.79193 15.0417
THE FOLLOWING INTERNAL PARAMETERS ARE SAVED:
ALPHAML, ALPHASE, GAMMAML, GAMMASE, CAMMABC, GAMMABCSE,COVSE,COVBCSE
***********************
** lognormal mle y **
***********************
LOGNORMAL MAXIMUM LIKELIHOOD ESTIMATION:
FULL SAMPLE CASE
TWO-PARAMETER MODEL (LOCATION = 0)
NUMBER OF OBSERVATIONS = 38
SAMPLE MINIMUM = 147.0000
SAMPLE MEAN = 185.7895
SAMPLE MEDIAN = 185.5000
SAMPLE STANDARD DEVIATION = 18.59549
ML ESTIMATE OF SHAPE PARAMETER (SIGMA) = 0.1002546
STANDARD ERROR OF SHAPE PARAMETER = 0.1165436E-01
ML ESTIMATE OF SCALE PARAMETER = 184.8847
ML ESTIMATE OF MU (= LOG(SCALE)) = 5.219732
STANDARD ERROR OF SCALE/MU = 0.1626344E-01
CONFIDENCE INTERVAL FOR SCALE PARAMETER
SCALE PARAMETER MU PARAMETER
CONFIDENCE LOWER UPPER LOWER UPPER
VALUE (%) LIMIT LIMIT LIMIT LIMIT
-----------------------------------------------------------------------
50.000 184.874 184.896 5.20865 5.23081
75.000 184.866 184.904 5.20073 5.23874
90.000 184.857 184.912 5.19229 5.24717
95.000 184.852 184.918 5.18678 5.25269
99.000 184.841 184.929 5.17557 5.26389
99.900 184.827 184.943 5.16161 5.27785
CONFIDENCE INTERVAL FOR SHAPE PARAMETER
CONFIDENCE LOWER UPPER
VALUE (%) LIMIT LIMIT
-------------------------------------------
50.000 0.936715E-01 0.109717
75.000 0.889265E-01 0.116492
90.000 0.844115E-01 0.124286
95.000 0.817339E-01 0.129704
99.000 0.769019E-01 0.141454
99.900 0.718825E-01 0.157350
THE FOLLOWING INTERNAL PARAMETERS ARE SAVED:
SIGMAML, SIGMASE, SCALEML, UHATML, UHATSE
*******************
** gamma mle y **
*******************
GAMMA MAXIMUM LIKELIHOOD ESTIMATION: FULL SAMPLE CASE
TWO-PARAMETER MODEL (LOCATION = 0)
NUMBER OF OBSERVATIONS = 38
MINIMUM VALUE = 147.0000
SAMPLE MEAN VALUE = 185.7895
SAMPLE STANDARD DEVIATION VALUE = 18.59549
SAMPLE GEOMETRIC MEAN VALUE = 184.8847
MOMENT ESTIMATE OF SCALE PARAMETER = 1.861205
MOMENT ESTIMATE OF SHAPE PARAMETER = 99.82214
ML ESTIMATE OF SCALE PARAMETER = 1.811020
STANDARD ERROR OF SCALE PARAMETER = 0.4158159
ML ESTIMATE OF SHAPE PARAMETER = 102.5883
STANDARD ERROR OF SHAPE PARAMETER = 23.49724
COVARIANCE OF THE SHAPE AND SCALE PARAMETERS = -9.746725
CONFIDENCE INTERVAL FOR SCALE PARAMETER
NORMAL APPROXIMATION LIKELIHOOD RATIO
CONFIDENCE LOWER UPPER LOWER UPPER
VALUE (%) LIMIT LIMIT LIMIT LIMIT
-----------------------------------------------------------------------
50.000 1.53056 2.09148 1.55723 2.12304
75.000 1.33269 2.28935 1.40621 2.38733
90.000 1.12706 2.49498 1.26945 2.70996
95.000 0.996035 2.62600 1.19156 2.94588
99.000 0.739949 2.88209 1.05707 3.49022
99.900 0.442772 3.17927 0.925650 4.29764
CONFIDENCE INTERVAL FOR SHAPE PARAMETER
NORMAL APPROXIMATION LIKELIHOOD RATIO
CONFIDENCE LOWER UPPER LOWER UPPER
VALUE (%) LIMIT LIMIT LIMIT LIMIT
-----------------------------------------------------------------------
50.000 86.7397 118.437 87.5479 119.267
75.000 75.5583 129.618 77.8833 132.049
90.000 63.9388 141.238 68.6408 146.247
95.000 56.5346 148.642 63.1638 155.791
99.000 42.0635 163.113 53.3517 175.581
99.900 25.2704 179.906 43.3751 200.473
THE FOLLOWING INTERNAL PARAMETERS ARE SAVED:
GAMMAML, GAMMASE, SCALEML, SCALESE, GAMMAMOM, SCALEMOM,COVSE
Date created: 6/5/2001
Last updated: 12/05/2005
Please email comments on this WWW page to
alan.heckert@nist.gov.
|