Dataplot Vol 1 Vol 2

# DISTRIBUTIONAL LIKELIHOOD RATIO TEST

Name:
DISTRIBUTIONAL LIKELIHOOD RATIO TEST
Type:
Analysis Command
Purpose:
Distinguish which of two specified distributions better fit a data set based on the likelihood ratio test.
Description:
In many cases, several distributions may provide adequate fit for a given data set. This test provides a method for selecting between two specific distributions. This test is somewhat different than the Kolmogorov-Smirnov (K-S) and Anderson-Darling (A-D) goodness of fit tests. The K-S and A-D tests have a null hypothesis that the data come from a single specific distribution with the alternative hypothesis that the data do not come from that distribution (i.e., there is no specific alternative distribution). On the other hand, the likelihood ratio test has a null hypothesis that the data come from distribution A against the alternative that they come from distribution B. With the likelihood ratio test, it may be that both distributions pass a K-S or A-D test or both fail a K-S or A-D test or one passes and one fails.

The likelihood ratio test given here was proposed by Dumonceaux, Antle, and Haas. The basic algorithm is as follows:

1. Fit the data to both distributions using maximum likelihood.

2. Compute the likelihood function for both distributions and then form the ratio of these likelihoods. Note that the distribution given in the null hypothesis is used for the denominator and the distribution given in the alternative hypothesis is used for the numerator. This ratio is the test statistic.

3. Critical values are determined via simulation. Specifically, 10,000 runs are simulated from the distribution given in the null hypothesis. The location and scale parametes will be set to 0 and 1, respectively. If there are shape parameters, these will be set to estimates obtained from the maximum likelihood fit of the original data.

Currently, Dataplot only supports this test for uncensored and ungrouped data from continuous distributions. Also, Dataplot only supports this command for distributions for which it supports maximum likelihood estimation.

Dumonceaux, Antle, and Haas proposed some simplified tests for a few specific cases. Dataplot supports the following specific cases:

1. H0: Normal, Ha: Exponential
2. H0: Exponential, Ha: Normal
3. H0: Normal, Ha: Double Exponential
4. H0: Double Exponential, Ha: Normal

It is also important to note that it matters which distribution is specified for the null hypothesis and which is specified for the alternative hypothesis. The power of the test is estimated by running 5,000 simulations from the alternative hypothesis distribution (as with the critical values, location and scale parameters are set to 0 and 1, respectively, and the shape parameter is obtained from the maximum likelihood fit). When the power is relatively low, the distribution specified in the null hypothesis may be favored. For example, suppose you are testing a Weibull and a lognormal. It is quite possible that if the Weibull is given as the null hypothesis distribution that the null hypothesis will not be rejected and likewise if the lognormal is given as the null hypothesis it will not be rejected either.

Syntax 1:
<dist1> AND <dist2> DISTRIBUTIONAL LIKELIHOOD TEST <y>
<SUBSET/EXCEPT/FOR qualification>
where <y> is the response variable;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.
Syntax 2:
<dist1> AND <dist2> MULTIPLE DISTRIBUTIONAL LIKELIHOOD
TEST <y1> ... <yk>
<SUBSET/EXCEPT/FOR qualification>
where <y1> ... <yk> is a list of up to 30 response variables;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

This syntax will generate the test for each of the listed response variables. Although the word MULTIPLE is optional, it can be useful to distinguish this from the REPLICATED case.

Note that the syntax

<dist1> AND <dist2> MULTIPLE DISTRIBUTIONAL LIKELIHOOD ...
RATIO TEST Y1 TO Y4

is supported. This is equivalent to

<dist1> AND <dist2> MULTIPLE DISTRIBUTIONAL LIKELIHOOD ...
RATIO TEST Y1 Y2 Y3 Y4
Syntax 3:
<dist1> AND <dist2> REPLICATED DISTRIBUTIONAL LIKELIHOOD TEST
<y> <x1> ... <xk>
<SUBSET/EXCEPT/FOR qualification>
where <y> is the response variable;
<x1> ... <xk> is a list of one to six group-id variables;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

This syntax peforms a cross-tabulation of <x1> ... <xk> and performs the test for each unique combination of cross-tabulated values. For example, if X1 has 3 levels and X2 has 2 levels, there will be a total of 6 likelihood ratio tests performed.

The word REPLICATED is required to distinguish the replication case from the multiple case (if there are multiple variables and neither MULTIPLE or REPLICATED is specified, Dataplot assumes MULTIPLE).

Note that the syntax

<dist1> AND <dist2> REPLICATED DISTRIBUTIONAL LIKELIHOOD ...
RATIO TEST Y X1 TO X4

is supported. This is equivalent to

<dist1> AND <dist2> REPLICATED DISTRIBUTIONAL LIKELIHOOD ...
RATIO TEST Y X1 X2 X3 X4
Examples:
NORMAL AND EXPONENTIAL DISTRIBUTIONAL LIKELIHOOD RATIO ...
TEST Y1
NORMAL AND EXPONENTIAL MULTIPLE DISTRIBUTIONAL LIKELIHOOD ...
RATIO TEST Y1 TO Y5
NORMAL AND EXPONENTIAL REPLICATED DISTRIBUTIONAL ...
LIKELIHOOD RATIO TEST Y X
NORMAL AND EXPONENTIAL DISTRIBUTIONAL LIKELIHOOD RATIO ...
TEST Y1 SUBSET Y1 > 0
Note:
For a list of supported distributions (i.e., for which Dataplot supports maximum likelihood) enter the command

Note:
If both distributions are single word names (e.g., NORMAL and EXPONENTIAL), then the word AND is optional. However, if at least one of the distributions has multiple words (e.g., 3-PARAMETER WEIBULL), then the word AND is required.

The word TEST is optional.

Note:
Although the Dumonceaux, Antle, and Haas papers provide some tables for a few specific cases, Dataplot generates the critical values and power values by running the simulations dynamically. Due to the use of different random number generators, there may be some small differences between the Dataplot results and the tables in the papers. These differences should not have much practical importance.
Note:
You can specify the number of digits in the output with the command

SET WRITE DECIMALS <value>
Note:
The DISTRIBUTIONAL LIKEHOOD RATIO TEST command automatically saves the following parameters:

 STATVAL = the value of the test statistic STATCDF = the CDF of the test statistic PVALUE = the p-value of the test statistic CUTOFF90 = the 90 percent point of the reference distribution CUTOFF95 = the 95 percent point of the reference distribution CUTOFF99 = the 99 percent point of the reference distribution POWER90 = the power corresponding to the 90 percent point of the reference distribution POWER95 = the power corresponding to the 95 percent point of the reference distribution POWER99 = the power corresponding to the 99 percent point of the reference distribution

If the MULTIPLE or REPLICATED option is used, these values will be written to the file "dpst1f.dat" instead.

Default:
None
Synonyms:
DISTRIBUTIONAL LIKELIHOOD RATIO MULTIPLE is a synonym for MULTIPLE DISTRIBUTIONAL LIKELIHOOD RATIO
DISTRIBUTIONAL LIKELIHOOD RATIO REPLICATED is a synonym for REPLICATED DISTRIBUTIONAL LIKELIHOOD RATIO
Related Commands:
 MAXIMUM LIKELIHOOD = Computes maximum likelihood estimates for distributional fits. GOODNESS OF FIT = Performs Kolmogorov-Smirnov, Anderson-Darling, chi-square, and PPCC goodness of fit tests. BEST DISTRIBUTIONAL FIT = = Ranks distributional fits for many common distributions. KAPPENMAN R = Generate Kappenman's statistic for distinguishing between a lognormal and a Weibull distributional model. PROBABILITY PLOT = Generates a probability plot.
Reference:
Dumonceaux, Antle and Haas (1973), "Likelihood Ratio Test for Discrimination Between Two Models with Unknown Location and Scale Parameters", Technometrics, Vol. 15, No. 1, pp. 19-27.

Dumonceaux and Antle (1973), "Discrimination Between the Log-Normal and Weibull Distributions", Technometrics, Vol. 15, No. 4, pp. 923-926.

Applications:
Distributional Models
Implementation Date:
2014/05
Program 1:

. Step 1:   Create the data for the example on page 25 of the
.           Dumonceaux, Antle, and Hass Technometrics paper
.
35.15  44.62  40.85  45.32  36.08
38.97  32.48  34.36  38.05  26.84
33.68  42.90  33.57  36.64  33.82
42.26  37.88  38.57  32.05  41.50
end of data
.
. Step 2:   Perform Test
.
set write decimals 4
normal and exponential distributional likelihood ratio test y
normal and double exponential distributional likelihood ratio test y

             Distributional Likelihood Ratio Test

Response Variable: Y

H0: Data are from distribution -
NORMAL
Ha: Data are from distribution -
EXPONENTIAL

Summary Statistics:
Total Number of Observations:                        20
Sample Mean:                                    37.2795
Sample Standard Deviation:                       4.7235
Sample Minimum:                                 26.8400
Sample Maximum:                                 45.3200

H0 Distribution:
Estimate of Location Parameter:                 37.2795
Estimate of Scale Parameter:                     4.7235

Ha Distribution:
Estimate of Location Parameter:                 26.8400
Estimate of Scale Parameter:                    10.4395

Test:
Test Statistic Value:                            0.4525
CDF of Test Statistic:                           0.1818
P-Value:                                         0.8182
Number of Simulations for CV:                     10000
Number of Simulations for Power:                   4999

Percent Points of the Reference Distribution
-----------------------------------
Percent Point               Value
-----------------------------------
50.0    =          0.538
75.0    =          0.612
80.0    =          0.631
90.0    =          0.683
95.0    =          0.736
99.0    =          0.824
99.9    =          0.936

Conclusions (Upper 1-Tailed Test)
-------------------------------------------------
Power  Critical
Alpha    CDF  (1-Beta)     Value     Conclusion
-------------------------------------------------
10%    90%      0.98     0.683      Accept H0
5%    95%      0.95     0.736      Accept H0
1%    99%      0.85     0.824      Accept H0

Distributional Likelihood Ratio Test

Response Variable: Y

H0: Data are from distribution -
NORMAL
Ha: Data are from distribution -
DOUBLE EXPONENTIAL

Summary Statistics:
Total Number of Observations:                        20
Sample Mean:                                    37.2795
Sample Standard Deviation:                       4.7235
Sample Minimum:                                 26.8400
Sample Maximum:                                 45.3200

H0 Distribution:
Estimate of Location Parameter:                 37.2795
Estimate of Scale Parameter:                     4.7235

Ha Distribution:
Estimate of Location Parameter:                 37.2600
Estimate of Scale Parameter:                     3.8125

Test:
Test Statistic Value:                            1.2390
CDF of Test Statistic:                           0.2723
P-Value:                                         0.7277
Number of Simulations for CV:                     10000
Number of Simulations for Power:                   5000

Percent Points of the Reference Distribution
-----------------------------------
Percent Point               Value
-----------------------------------
50.0    =          1.282
75.0    =          1.336
80.0    =          1.351
90.0    =          1.391
95.0    =          1.432
99.0    =          1.510
99.9    =          1.634

Conclusions (Upper 1-Tailed Test)
-------------------------------------------------
Power  Critical
Alpha    CDF  (1-Beta)     Value     Conclusion
-------------------------------------------------
10%    90%      0.50     1.391      Accept H0
5%    95%      0.37     1.432      Accept H0
1%    99%      0.19     1.510      Accept H0

Program 2:

. Step 1:   Create the data for the example on page 22 of the
.           Dumonceaux, Antle, and Hass Technometrics paper
.
let y = data 47 38 29 92 41 44 47 62 59 44 47 41
.
. Step 2:   Perform Test
.
set write decimals 4
normal and cauchy distributional likelihood ratio test y

             Distributional Likelihood Ratio Test

Response Variable: Y

H0: Data are from distribution -
NORMAL
Ha: Data are from distribution -
CAUCHY

Summary Statistics:
Total Number of Observations:                        12
Sample Mean:                                    49.2500
Sample Standard Deviation:                      16.0348
Sample Minimum:                                 29.0000
Sample Maximum:                                 92.0000

H0 Distribution:
Estimate of Location Parameter:                 49.2500
Estimate of Scale Parameter:                    16.0348

Ha Distribution:
Estimate of Location Parameter:                 44.4556
Estimate of Scale Parameter:                     4.3886

Test:
Test Statistic Value:                            1.2468
CDF of Test Statistic:                           0.9945
P-Value:                                         0.0055
Number of Simulations for CV:                     10000
Number of Simulations for Power:                   5000

Percent Points of the Reference Distribution
-----------------------------------
Percent Point               Value
-----------------------------------
50.0    =          0.828
75.0    =          0.893
80.0    =          0.911
90.0    =          0.971
95.0    =          1.033
99.0    =          1.180
99.9    =          1.451

Conclusions (Upper 1-Tailed Test)
-------------------------------------------------
Power  Critical
Alpha    CDF  (1-Beta)     Value     Conclusion
-------------------------------------------------
10%    90%      0.81     0.971      Accept H0
5%    95%      0.74     1.033      Accept H0
1%    99%      0.62     1.180      Accept H0


Program 3:

. Step 1:   Create the data for the example on page 926 of the
.           Dumonceaux and Antle Technometrics paper
.
0.654  0.613  0.315  0.449  0.297  0.402  0.379  0.423  0.379  0.3235
0.269  0.740  0.418  0.412  0.494  0.416  0.338  0.392  0.484  0.265
end of data
.
. Step 2:   Perform Test
.
set write decimals 4
normal and gumbel distributional likelihood ratio test y

             Distributional Likelihood Ratio Test

Response Variable: Y

H0: Data are from distribution -
NORMAL
Ha: Data are from distribution -
GUMBEL

Summary Statistics:
Total Number of Observations:                        20
Sample Mean:                                     0.4231
Sample Standard Deviation:                       0.1253
Sample Minimum:                                  0.2650
Sample Maximum:                                  0.7400

H0 Distribution:
Estimate of Location Parameter:                  0.4231
Estimate of Scale Parameter:                     0.1253

Ha Distribution:
Estimate of Location Parameter:                  0.3841
Estimate of Scale Parameter:                     0.1434

Test:
Test Statistic Value:                            0.9869
CDF of Test Statistic:                           0.8517
P-Value:                                         0.1483
Number of Simulations for CV:                     10000
Number of Simulations for Power:                   4999

Percent Points of the Reference Distribution
-----------------------------------
Percent Point               Value
-----------------------------------
50.0    =          0.944
75.0    =          0.975
80.0    =          0.981
90.0    =          0.994
95.0    =          1.002
99.0    =          1.016
99.9    =          1.026

Conclusions (Upper 1-Tailed Test)
-------------------------------------------------
Power  Critical
Alpha    CDF  (1-Beta)     Value     Conclusion
-------------------------------------------------
10%    90%      0.29     0.994      Accept H0
5%    95%      0.17     1.002      Accept H0
1%    99%      0.04     1.016      Accept H0

Program 4: . Step 1: Create the data for the example on page 925 of the . Dumonceaux and Antle Technometrics paper . serial read y 17.88 28.92 33.00 41.52 42.12 45.60 48.48 51.84 51.96 54.12 55.56 67.80 68.64 68.64 68.88 84.12 93.12 98.64 105.12 105.84 127.92 128.04 173.40 end of data . . Step 2: Perform Test . set write decimals 4 lognormal and weibull distributional likelihood ratio test y weibull and lognormal distributional likelihood ratio test y
             Distributional Likelihood Ratio Test

Response Variable: Y

H0: Data are from distribution -
LOG-NORMAL
Ha: Data are from distribution -
WEIBULL

Summary Statistics:
Total Number of Observations:                        23
Sample Mean:                                    72.2243
Sample Standard Deviation:                      37.4887
Sample Minimum:                                 17.8800
Sample Maximum:                                173.4000

H0 Distribution:
Estimate of Scale Parameter:                    63.4628
Estimate of Shape Parameter 1:                   0.5334

Ha Distribution:
Estimate of Scale Parameter:                    81.8783
Estimate of Shape Parameter 1:                   2.1021

Test:
Test Statistic Value:                            0.9763
CDF of Test Statistic:                           0.6753
P-Value:                                         0.3247
Number of Simulations for CV:                     10000
Number of Simulations for Power:                   4999

Percent Points of the Reference Distribution
-----------------------------------
Percent Point               Value
-----------------------------------
50.0    =          0.945
75.0    =          0.990
80.0    =          1.002
90.0    =          1.033
95.0    =          1.062
99.0    =          1.117
99.9    =          1.193

Conclusions (Upper 1-Tailed Test)
-------------------------------------------------
Power  Critical
Alpha    CDF  (1-Beta)     Value     Conclusion
-------------------------------------------------
10%    90%      0.65     1.033      Accept H0
5%    95%      0.51     1.062      Accept H0
1%    99%      0.29     1.117      Accept H0

Distributional Likelihood Ratio Test

Response Variable: Y

H0: Data are from distribution -
WEIBULL
Ha: Data are from distribution -
LOG-NORMAL

Summary Statistics:
Total Number of Observations:                        23
Sample Mean:                                    72.2243
Sample Standard Deviation:                      37.4887
Sample Minimum:                                 17.8800
Sample Maximum:                                173.4000

H0 Distribution:
Estimate of Scale Parameter:                    81.8783
Estimate of Shape Parameter 1:                   2.1021

Ha Distribution:
Estimate of Scale Parameter:                    63.4628
Estimate of Shape Parameter 1:                   0.5334

Test:
Test Statistic Value:                            1.0243
CDF of Test Statistic:                           0.8752
P-Value:                                         0.1248
Number of Simulations for CV:                     10000
Number of Simulations for Power:                   5000

Percent Points of the Reference Distribution
-----------------------------------
Percent Point               Value
-----------------------------------
50.0    =          0.940
75.0    =          0.990
80.0    =          1.003
90.0    =          1.033
95.0    =          1.062
99.0    =          1.118
99.9    =          1.183

Conclusions (Upper 1-Tailed Test)
-------------------------------------------------
Power  Critical
Alpha    CDF  (1-Beta)     Value     Conclusion
-------------------------------------------------
10%    90%      0.64     1.033      Accept H0
5%    95%      0.49     1.062      Accept H0
1%    99%      0.22     1.118      Accept H0



NIST is an agency of the U.S. Commerce Department.

Date created: 01/31/2015
Last updated: 01/31/2015

Please email comments on this WWW page to alan.heckert.gov.