 Dataplot Vol 1 Vol 2

# DEHAAN

Name:
DEHAAN
Type:
Analysis Command
Purpose:
Estimate the parameters of a generalized Pareto distribution using the de Haan method.
Description:
The generalized Pareto distribution (GPD) is an asymptotic distribution developed by using the fact that exceedances of a sufficiently high threshold are rare events to which the Poisson distribution applies.

The cumulative distribution function of the generalized Pareto distribution is

$$G(y) = 1 - {[1 + (cy/a)]^{-1/c}} \hspace{0.5 in} a > 0, (1 + (cy/a)) > 0$$

Here, c is the shape parameter and a is the scale parameter.

This equation can be used to represent the conditional cumulative distribution of the excess Y = X - u of the variate X over the threshold u, given X > u for u sufficiently large.

The cases c > 0, c = 0, and c < 0 correspond respectively to the extreme value type II (Frechet), extreme value type I (Gumbel), and reverse Weibull domains of attraction.

Given the mean E(Y) and standard deviation sY of the variate Y, then

a = 0.5*E(Y)*{1 + [E(Y)/sY]2}
c = 0.5*{1 - [E(Y)/sY]2}

Note that for the case where c < 0, then $$\gamma$$ = -1/c is the estimate of the shape parameter for the reverse Weibull (SET MINMAX 2 case in Dataplot) distribution.

The de Haan estimates of a and c are determined as follows.

1. Let k equal the number of data points above the threshold so that u represents the (k+1)-th highest data points. We have $$\lambda$$ = k/n where n is the length of the record (in whatever units are appropriate, e.g., years). The highest, second highest, ... kth highest, (k+1)th highest variates are denoted by Xn,n Xn-1,n ... , Xn-(k+1),n respectively.

2. Compute the following quantities.
$$M_{n}^{(r)} = \frac{1} {k} \sum_{i=0}^{k-1} {[log(X_{n-i+1,n}) - log(X{n-k,n})]^r}, \hspace{0.3 in} r=1, 2$$

3. The estimates for c and a are as follows.

$$\hat{c} = M_{n}^{(1)} + 1 - \frac{1} {2(1 - [M_{n}^{(1)}]^2/[M_{n}^{(2)}])}$$

$$\hat{a} = \mu M_{n}^{(1)}/\rho_1$$

where

$$\rho_1$$ $$\rho_1$$

Formulas for the standard deviation of c are given in the paper "Extreme Wind Distribution Tails: A 'Peaks Over Threshold' Approach" (see the Reference section below).

Syntax:
DEHAAN MLE <y> <SUBSET/EXCEPT/FOR qualification>
where <y> is the response variable;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.
Examples:
DEHAAN MLE Y
DEHAAN MLE Y SUBSET TAG > 0
Note:
The user specified threshold is determined by entering the following command before the DEHAAN command:

LET THRESHOL = <value>

If no threshold is specified, then the minimum data value is used as the threshold.

Note:
The following internal parameters will be saved.

 GAMMA = shape parameter for generalized Pareto distribution A = scale parameter for generalized Pareto distribution SDGAMMA = standard deviation of GAMMA

If the absolute value of GAMMA is within a user-specified tolerance of zero, then the following are also saved.

 LOC = location parameter for Gumbel distribution. SCALE = scale parameter for Gumbel distribution.

To specify this tolerance, enter the command

SET PEAKS OVER THRESHOLD TOLERANCE <value>

The default tolerance is 0.05.

If GAMMA is less than zero with an absolute value greater than the above tolerance, then the following are also saved.

 GAMMA2 = shape parameter for reverse Weibull distribution. LOC = location parameter for reverse Weibull distribution. SCALE = scale parameter for reverse Weibull distribution.

These estimates for the reverse Weibull and Gumbel distributions are based on moment estimators. The formulas are given on page 3 of NIST Building Science Series 174 (see the Reference section below). Currently, no estimates for the Frechet case (GAMMA > 0) are saved.

Note:
The May, 2005 version added support for generating the output in Latex or HTML. Enter

HELP CAPTURE HTML HELP CAPTURE LATEX

for details. The ASCII output was also modified somewhat. This was a cosemetic change to make the output clearer.

Note:
The PEAKS OVER THRESHOLD PLOT was added in the 5/2005 version. This plot shows how the estimate of the shape parameter changes as the the threshold changes.
Default:
None.
Synonyms:
None
Related Commands:
 CME = Compute the CME estimates for the generalized Pareto distribution. CME PLOT = Generate a CME plot. GEPPDF = Compute the probability density function for the generalized Pareto distribution. PEAKS OVER THRESHOLD PLOT = Generate a peaks over threshold plot.
Reference:
"Continuous Univariate Distributions: Volume I", 2nd. ed., Johnson, Kotz, and Balakrishnan, John Wiley and Sons, 1994.

"Estimates of Hurricane Wind Speeds by the "Peaks Over Threshold" Approach", Alan Heckert, Emil Simiu, and Tim Whalen, Journal of Structural Engineering, April, 1998.

"Extreme Wind Distribution Tails: A "Peaks Over Threshold" Approach", Simiu and Heckert, Journal of Structural Engineering, May, 1996.

"Assessment of 'peak over threshold' Methods for Estimating Extreme Value Distribution Tails", J. A. Lechner, E. Simiu, N. A. Heckert, Structural Safety, 1993.

"Estimates of Hurricane Wind Speeds by the 'Peaks Over Threshold' Method", E. Simiu, N. A. Heckert, T. Whalen, NIST Technical Note 1416, February, 1996.

"Extreme Wind Estimates by the Conditional Mean Exceedance Procedure", J. L. Gross, E. Simiu, N. A. Heckert, J. A. Lechner, NISTIR 5531, April, 1995.

"Extreme Wind Distribution Tails: A 'Peaks Over Threshold' Approach", E. Simiu, N. A. Heckert, NIST Building Science Series 174, March, 1995

Applications:
Extreme Value Analysis
Implementation Date:
1998/5 2005/5: Modified the appearance of the output.
2005/5: Added support for HTML/Latex format output.
Program:

SKIP 25
LET Y2 = SORT Y
LET THRESHOL = Y2(900)
SET WRITE DECIMALS 5
DEHAAN MLE Y

The following output is generated.
            Generalized Pareto Parameter Estimation (de Haan)
(Maximum Case)

Summary Statistics (Full Data Set):
Number of Observations:                    977
Sample Mean:                               7.81898
Sample Standard Deviation:                 17.76409
Sample Minimum:                            0.00000
Sample Maximum:                            90.04000

Summary Statistics for
Observations Above Threshold:
Threshold:                                 43.36000
Number of Observations Above Threshold:    77
Sample Mean:                               56.76623
Sample Standard Deviation:                 10.39647

de Haan Parameter Estimates:
Location (Threshold) Parameter:            43.36000
Scale Parameter:                           14.92411
Shape Parameter (Gamma):                   -0.35375
Standard Deviation of Gamma:               0.13108

For negative Gamma, the generalized Pareto
is equivalent to a reverse Weibull
(SET MINMAX MAX) with:
Shape Parameter (Gamma):                   2.82687
Location Parameter:                        54.16054
Scale Parameter:                           52.02372


NIST is an agency of the U.S. Commerce Department.

Date created: 6/5/2001
Last updated: 10/13/2015