SED navigation bar go to SED home page go to Dataplot home page go to NIST home page SED Home Page SED Staff SED Projects SED Products and Publications Search SED Pages

Dataplot Vol 1 Vol 2


DEHAAN

Name:
    DEHAAN
Type:
    Analysis Command
Purpose:
    Estimate the parameters of a generalized Pareto distribution using the de Haan method.
Description:
    The generalized Pareto distribution (GPD) is an asymptotic distribution developed by using the fact that exceedances of a sufficiently high threshold are rare events to which the Poisson distribution applies.

    The cumulative distribution function of the generalized Pareto distribution is

      \( G(y) = 1 - {[1 + (cy/a)]^{-1/c}} \hspace{0.5 in} a > 0, (1 + (cy/a)) > 0 \)

    Here, c is the shape parameter and a is the scale parameter.

    This equation can be used to represent the conditional cumulative distribution of the excess Y = X - u of the variate X over the threshold u, given X > u for u sufficiently large.

    The cases c > 0, c = 0, and c < 0 correspond respectively to the extreme value type II (Frechet), extreme value type I (Gumbel), and reverse Weibull domains of attraction.

    Given the mean E(Y) and standard deviation sY of the variate Y, then

      a = 0.5*E(Y)*{1 + [E(Y)/sY]2}
      c = 0.5*{1 - [E(Y)/sY]2}

    Note that for the case where c < 0, then \( \gamma\) = -1/c is the estimate of the shape parameter for the reverse Weibull (SET MINMAX 2 case in Dataplot) distribution.

    The de Haan estimates of a and c are determined as follows.

    1. Let k equal the number of data points above the threshold so that u represents the (k+1)-th highest data points. We have \( \lambda \) = k/n where n is the length of the record (in whatever units are appropriate, e.g., years). The highest, second highest, ... kth highest, (k+1)th highest variates are denoted by Xn,n Xn-1,n ... , Xn-(k+1),n respectively.

    2. Compute the following quantities.
        \( M_{n}^{(r)} = \frac{1} {k} \sum_{i=0}^{k-1} {[log(X_{n-i+1,n}) - log(X{n-k,n})]^r}, \hspace{0.3 in} r=1, 2 \)

    3. The estimates for c and a are as follows.

        \( \hat{c} = M_{n}^{(1)} + 1 - \frac{1} {2(1 - [M_{n}^{(1)}]^2/[M_{n}^{(2)}])} \)

        \( \hat{a} = \mu M_{n}^{(1)}/\rho_1 \)

      where

        \( \rho_1 \) \( \rho_1 \)

      Formulas for the standard deviation of c are given in the paper "Extreme Wind Distribution Tails: A 'Peaks Over Threshold' Approach" (see the Reference section below).

Syntax:
    DEHAAN MLE <y> <SUBSET/EXCEPT/FOR qualification>
    where <y> is the response variable;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.
Examples:
    DEHAAN MLE Y
    DEHAAN MLE Y SUBSET TAG > 0
Note:
    The user specified threshold is determined by entering the following command before the DEHAAN command:

      LET THRESHOL = <value>

    If no threshold is specified, then the minimum data value is used as the threshold.

Note:
    The following internal parameters will be saved.

      GAMMA = shape parameter for generalized Pareto distribution
      A = scale parameter for generalized Pareto distribution
      SDGAMMA = standard deviation of GAMMA

    If the absolute value of GAMMA is within a user-specified tolerance of zero, then the following are also saved.

      LOC = location parameter for Gumbel distribution.
      SCALE = scale parameter for Gumbel distribution.

    To specify this tolerance, enter the command

      SET PEAKS OVER THRESHOLD TOLERANCE <value>

    The default tolerance is 0.05.

    If GAMMA is less than zero with an absolute value greater than the above tolerance, then the following are also saved.

      GAMMA2 = shape parameter for reverse Weibull distribution.
      LOC = location parameter for reverse Weibull distribution.
      SCALE = scale parameter for reverse Weibull distribution.

    These estimates for the reverse Weibull and Gumbel distributions are based on moment estimators. The formulas are given on page 3 of NIST Building Science Series 174 (see the Reference section below). Currently, no estimates for the Frechet case (GAMMA > 0) are saved.

Note:
    The May, 2005 version added support for generating the output in Latex or HTML. Enter

      HELP CAPTURE HTML HELP CAPTURE LATEX

    for details. The ASCII output was also modified somewhat. This was a cosemetic change to make the output clearer.

Note:
    The PEAKS OVER THRESHOLD PLOT was added in the 5/2005 version. This plot shows how the estimate of the shape parameter changes as the the threshold changes.
Default:
    None.
Synonyms:
    None
Related Commands:
    CME = Compute the CME estimates for the generalized Pareto distribution.
    CME PLOT = Generate a CME plot.
    GEPPDF = Compute the probability density function for the generalized Pareto distribution.
    PEAKS OVER THRESHOLD PLOT = Generate a peaks over threshold plot.
Reference:
    "Continuous Univariate Distributions: Volume I", 2nd. ed., Johnson, Kotz, and Balakrishnan, John Wiley and Sons, 1994.

    "Estimates of Hurricane Wind Speeds by the "Peaks Over Threshold" Approach", Alan Heckert, Emil Simiu, and Tim Whalen, Journal of Structural Engineering, April, 1998.

    "Extreme Wind Distribution Tails: A "Peaks Over Threshold" Approach", Simiu and Heckert, Journal of Structural Engineering, May, 1996.

    "Assessment of 'peak over threshold' Methods for Estimating Extreme Value Distribution Tails", J. A. Lechner, E. Simiu, N. A. Heckert, Structural Safety, 1993.

    "Estimates of Hurricane Wind Speeds by the 'Peaks Over Threshold' Method", E. Simiu, N. A. Heckert, T. Whalen, NIST Technical Note 1416, February, 1996.

    "Extreme Wind Estimates by the Conditional Mean Exceedance Procedure", J. L. Gross, E. Simiu, N. A. Heckert, J. A. Lechner, NISTIR 5531, April, 1995.

    "Extreme Wind Distribution Tails: A 'Peaks Over Threshold' Approach", E. Simiu, N. A. Heckert, NIST Building Science Series 174, March, 1995

Applications:
    Extreme Value Analysis
Implementation Date:
    1998/5 2005/5: Modified the appearance of the output.
    2005/5: Added support for HTML/Latex format output.
Program:
     
    SKIP 25
    READ MPOST550.DAT Y
    LET Y2 = SORT Y
    LET THRESHOL = Y2(900)
    SET WRITE DECIMALS 5
    DEHAAN MLE Y
        
    The following output is generated.
                Generalized Pareto Parameter Estimation (de Haan)
                                 (Maximum Case)
     
    Summary Statistics (Full Data Set):
    Number of Observations:                    977
    Sample Mean:                               7.81898
    Sample Standard Deviation:                 17.76409
    Sample Minimum:                            0.00000
    Sample Maximum:                            90.04000
     
    Summary Statistics for
    Observations Above Threshold:
    Threshold:                                 43.36000
    Number of Observations Above Threshold:    77
    Sample Mean:                               56.76623
    Sample Standard Deviation:                 10.39647
     
    de Haan Parameter Estimates:
    Location (Threshold) Parameter:            43.36000
    Scale Parameter:                           14.92411
    Shape Parameter (Gamma):                   -0.35375
    Standard Deviation of Gamma:               0.13108
     
    For negative Gamma, the generalized Pareto
    is equivalent to a reverse Weibull
    (SET MINMAX MAX) with:
    Shape Parameter (Gamma):                   2.82687
    Location Parameter:                        54.16054
    Scale Parameter:                           52.02372
        
Date created: 06/05/2001
Last updated: 12/11/2023

Please email comments on this WWW page to alan.heckert@nist.gov.