SED navigation bar go to SED home page go to Dataplot home page go to NIST home page SED Home Page SED Contacts SED Projects SED Products and Publications Search SED Pages

Dataplot Vol 1 Auxiliary Chapter


DEHAAN

Name:
    DEHAAN
Type:
    Analysis Command
Purpose:
    Estimate the parameters of a generalized Pareto distribution using the de Haan method.
Description:
    The generalized Pareto distribution (GPD) is an asymptotic distribution developed by using the fact that exceedances of a sufficiently high threshold are rare events to which the Poisson distribution applies.

    The cumulative distribution function of the generalized Pareto distribution is

      G(y) = 1 - {[1 + (c*y/a)]**(-1/k)}    a > 0, [1 + (c*y/a)] > 0

    Here, c is the shape parameter and a is the scale parameter.

    This equation can be used to represent the conditional cumulative distribution of the excess Y = X - u of the variate X over the threshold u, given X > u for u sufficiently large.

    The cases c > 0, c = 0, and c < 0 correspond respectively to the extreme value type II (Frechet), extreme value type I (Gumbel), and reverse Weibull domains of attraction.

    Given the mean E(Y) and standard deviation sY of the variate Y, then

      a = 0.5*E(Y)*{1 + [E(Y)/sY]2}
      c = 0.5*{1 - [E(Y)/sY]2}

    Note that for the case where c < 0, then gamma = -1/c is the estimate of the shape parameter for the reverse Weibull (SET MINMAX 2 case in Dataplot) distribution.

    The de Haan estimates of a and c are determined as follows.

    1. Let k equal the number of data points above the threshold so that u represents the (k+1)-th highest data points. We have lambda = k/n where n is the length of the record (in whatever units are appropriate, e.g., years). The highest, second highest, ... kth highest, (k+1)th highest variates are denoted by Xn,n Xn-1,n ... , Xn-(k+1),n respectively.

    2. Compute the following quantities.
        M(r,n) = (1/k)*SUM[log(X(n-i+1,n) - log(X(n-k,n))]**r, r = 1,2 where the summation is from i = 0 to k-1

    3. The estimates for c and a are as follows.

        c = M(1,n) + 1 - 1/[2*{1 - [M(1,n)]**2/[M(2,n)]}]

        a = u*M(1,n)/p1

      where

        p1 = 1, c >= 0
        p1 = 1(1-c) c < 0

      Formulas for the standard deviation of c are given in the paper "Extreme Wind Distribution Tails: A 'Peaks Over Threshold' Approach" (see the Reference section below).

Syntax:
    DEHAAN <y> <SUBSET/EXCEPT/FOR qualification>
    where <y> is the response variable;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.
Examples:
    DEHAAN Y
    DEHAAN Y SUBSET TAG > 0
Note:
    The user specified threshold is determined by entering the following command before the DEHAAN command:

      LET THRESHOL = <value>

    If no threshold is specified, then the minimum data value is used as the threshold.

Note:
    The following internal parameters will be saved.

      GAMMA = shape parameter for generalized Pareto distribution
      A = scale parameter for generalized Pareto distribution
      SDGAMMA = standard deviation of GAMMA

    If the absolute value of GAMMA is within a user-specified tolerance of zero, then the following are also saved.

      LOC = location parameter for Gumbel distribution.
      SCALE = scale parameter for Gumbel distribution.

    To specify this tolerance, enter the command

      SET PEAKS OVER THRESHOLD TOLERANCE <value>

    The default tolerance is 0.05.

    If GAMMA is less than zero with an absolute value greater than the above tolerance, then the following are also saved.

      GAMMA2 = shape parameter for reverse Weibull distribution.
      LOC = location parameter for reverse Weibull distribution.
      SCALE = scale parameter for reverse Weibull distribution.

    These estimates for the reverse Weibull and Gumbel distributions are based on moment estimators. The formulas are given on page 3 of NIST Building Science Series 174 (see the Reference section below). Currently, no estimates for the Frechet case (GAMMA > 0) are saved.

Note:
    The May, 2005 version added support for generating the output in Latex or HTML. Enter

      HELP CAPTURE HTML HELP CAPTURE LATEX

    for details. The ASCII output was also modified somewhat. This was a cosemetic change to make the output clearer.

Note:
    The PEAKS OVER THRESHOLD PLOT was added in the 5/2005 version. This plot shows how the estimate of the shape parameter changes as the the threshold changes.
Default:
    None.
Synonyms:
    The following are synonyms for DEHAAN Y:

      DEHAAN GENERALIZED PARETO ESTIMATE Y
      DEHAAN GENERALIZED PARETO Y
      DEHAAN ESTIMATE Y
Related Commands:
    CME = Compute the CME estimates for the generalized Pareto distribution.
    CME PLOT = Generate a CME plot.
    GEPPDF = Compute the probability density function for the generalized Pareto distribution.
    PEAKS OVER THRESHOLD PLOT = Generate a peaks over threshold plot.
Reference:
    "Continuous Univariate Distributions: Volume I", 2nd. ed., Johnson, Kotz, and Balakrishnan, John Wiley and Sons, 1994.

    "Estimates of Hurricane Wind Speeds by the "Peaks Over Threshold" Approach", Alan Heckert, Emil Simiu, and Tim Whalen, Journal of Structural Engineering, April, 1998.

    "Extreme Wind Distribution Tails: A "Peaks Over Threshold" Approach", Simiu and Heckert, Journal of Structural Engineering, May, 1996.

    "Assessment of 'peak over threshold' Methods for Estimating Extreme Value Distribution Tails", J. A. Lechner, E. Simiu, N. A. Heckert, Structural Safety, 1993.

    "Estimates of Hurricane Wind Speeds by the 'Peaks Over Threshold' Method", E. Simiu, N. A. Heckert, T. Whalen, NIST Technical Note 1416, February, 1996.

    "Extreme Wind Estimates by the Conditional Mean Exceedance Procedure", J. L. Gross, E. Simiu, N. A. Heckert, J. A. Lechner, NISTIR 5531, April, 1995.

    "Extreme Wind Distribution Tails: A 'Peaks Over Threshold' Approach", E. Simiu, N. A. Heckert, NIST Building Science Series 174, March, 1995

Applications:
    Extreme Value Analysis
Implementation Date:
    1998/5 2005/5: Modified the appearance of the output.
    2005/5: Added support for HTML/Latex format output.
Program:
     
    SKIP 25
    READ MPOST550.DAT Y
    LET Y2 = SORT Y
    LET THRESHOL = Y2(900)
    echo on
    capture dehaan.out
    DEHAAN Y
        
    The following output is generated.
           ****************
           **  DEHAAN Y  **
           ****************
      
      
               DEHAAN ESTIMATION FOR THE GENERALIZED PARETO DISTRIBUTION
      
     NUMBER OF OBSERVATIONS                     =      977
     THRESHOLD                                  =    43.36000
     NUMBER OF OBSERVATIONS ABOVE THE THRESHOLD =       77
      
     ESTIMATE OF SHAPE PARAMETER GAMMA          =  -0.3852721
     STANDARD DEVIATION OF GAMMA                =   0.1362827
     SCALE PARAMETER A                          =    15.39923
      
      
     FOR NEGATIVE GAMMA, THE GENERALIZED PARETO IS EQUIVALENT TO
     A REVERSE WEIBULL (SET MINMAX MAX) WITH:
     SHAPE PARAMETER GAMMA                    =    2.595568
     LOCATION PARAMETER                       =    81.89209
     SCALE PARAMETER                          =    28.28961
      
      
     GAMMA, SDGAMMA, AND A WILL BE SAVED AS INTERNAL PARAMETERS.
     THE REVERSE WEIBULL PARAMETERS WILL BE SAVED AS
     THE INTERNAL PARAMETERS GAMMA2, LOC, AND SCALE,  RESPECTIVELY.
        

Date created: 6/5/2001
Last updated: 5/16/2005
Please email comments on this WWW page to alan.heckert@nist.gov.