SED navigation bar go to SED home page go to Extreme Winds page go to NIST home page SED Home Page SED Staff SED Projects Extreme Winds Publications Search Wind Pages

Extreme Wind Speeds Software: Dataplot

Introduction Dataplot is a freely downloadable multi-platform (Unix/Linux, Windows 7/8/10, Mac OS X) program for scientific graphics, statistical analysis, and linear/non-linear modeling.

The original version of Dataplot was released by James J. Filliben in 1978 with continual enhancements to the present time. The authors are James J. Filliben and Alan Heckert of the Statistical Engineering Division, Information Technology Laboratory, National Institute of Standards and Technology.

Dataplot Capabilities Relevant to the Analysis of Extreme Winds Dataplot contains a number of features that are relevant for the analysis of extreme winds.
  • Dataplot provides a number of methods for graphing your data. A few specific graphs of interest for extreme value analysis include:

  • Dataplot supports a library of probability functions. This includes cumulative distribution functions, probability density functions, and percent point functions. The library supports 100+ distributions including the five extreme value distributions (Gumbel, Frechet, Weibull/reverse Weibull, generalized Pareto, and generalized extreme value).

  • Dataplot provides a number of different methods for fitting distributional models.

    • Maximum likelihood

      Dataplot supports a number of analytical estimation methods for the extreme value distributions.

      For the Gumbel distribution, moment and maximum likelihood estimates are supported.

      For the 2-parameter Frechet distribution, maximum likelihood estimates are supported. For the 3-parameter Frechet distribution, l-moment and elemental percentile estimates are supported.

      For the 2-parameter Weibull distribution, maximum likelihood estimates are supported. For the 3-parameter Weibull distribution, maximum likelihood, moment, modified moment, l-moment, and and elemental percentile estimates are supported.

      For the generalized Pareto distribution, maximum likelihood, moment, l-moment, and elemental percentile estimates are supported. Note that the moment, l-moment, and maximum likelihood estimates work well for restricted ranges of the shape parameter. One advantage of elemental percentile estimates for the generalized Pareto distribution is that they work well over a much broader range of values for the shape parameter.

      For the generalized extreme value distribution, l-moment and elemental percentile estimates are supported.

      In most cases, analytic confidence intervals for distribution parameters and select quantiles will also be generated. When they are not, they can be generated via bootstrapping.

    • The PPCC plot can be used to estimate the shape parameter and a probability plot can be used to estimate the location and scale parameters.

      Confidence intervals for distribution parameters and select quantiles can be generated via bootstrapping.

    • For the generalized Parero, Dataplot supports the de Haan and CME (Conditional Mean Exceedance) estimation methods.

  • Dataplot provides several methods for assessing the goodness of fit of a distributional model.

  • Dataplot provides a few capabilities that are specifically for the analysis of extreme values.

Further Information Additional information is available at the Dataplot web site.
Example
Maximum Annual Wind Speeds for Washington, DC The following example analyzes a data set containing the maximum annual wind speeds in Washington, DC from 1945 to 1977.

The purpose of this example is to illustrate how certain tasks are performed in Dataplot (i.e., it is not meant to be a case study, just a demonstration of some of the basic Dataplot commands used in extreme value analysis).

Specifically, the example will demonstrate the following:

  • How to generate preliminary plots of the data
  • How to perform distributional modeling for the extreme value distributions
Read the Data and Generate Preliminary Plots The first step is to read the data and generate some preliminary plots. This is accomplished with the following Dataplot commands:
    .
    . Step 1: Read the Data, define some default
    . plot control settings
    .
    skip 25
    read washdc.dat y x
    title case asis
    title displacement 2
    label case asis
    .
    . Step 2: Preliminary Plots of the Data
    . a. Run Sequence Plot
    . b. Relative Histogram
    . c. Kernel Density Plot
    .
    title Maximum Annual Wind Speeds for Washington, DC
    y1label Wind Speed (MPH)
    x1label Year
    plot y x
    .
    title Maximum Annual Wind Speeds for Washington, DC
    y1label Relative Frequency
    x1label Wind Speed
    relative histogram y
    .
    title Maximum Annual Wind Speeds for Washington, DC
    y1label Probability
    x1label Wind Speed
    kernel density plot y

The following graphs are generated.

Run Sequence Plot of the Data

Relative Histogram of the Data

Kernel Density Plot of the Data

We can make the following conclusions based on these plots.

  1. The bulk of the annual maximums lie in the range 35 mph to 65 mph.
  2. There is one annual maximum that is above 75 mph.
  3. The mode of the data is around 45 mph.
  4. The data exhibit slight skewing to the right.
Generalized Pareto Distributional Model The next step is to develop an appropriate distributional model for the data.

A good starting point is to use the generalized Pareto and/or the generalized extreme value distributions since these contain the Gumbel, Frechet, and reverse Weibull as special cases.

The following shows the typical steps in developing a generalized Pareto distributional model. We use the PPCC/probability plot approach to estimate the shape, location, and scale parameters. This analysis is performed with the following Dataplot commands.

    .
    . Step 3: Generalized Pareto Model
    .
    title Generalized Pareto PPCC Plot
    y1label PPCC Value
    x1label Value of Shape Parameter
    generalized pareto ppcc plot y
    .
    let gamma = shape
    title Generalized Pareto Probability Plot
    x1label Theorerical
    y1label Data
    char x
    line blank
    generalized pareto probability plot y
    justification left
    move 20 85
    text Shape Parameter = ^gamma
    move 20 81
    text Location Parameter = ^ppa0
    move 20 77
    text Scale Parameter = ^ppa1
    move 20 73
    text PPCC Value = ^maxppcc
    .
    let ksloc = ppa0
    let ksscale = ppa1
    generalized pareto kolm smir goodness of fit y

The following graphs and output are generated.

Generalized Pareto PPCC Plot of the Data

Generalized Pareto Probability Plot of the Data

                   KOLMOGOROV-SMIRNOV GOODNESS-OF-FIT TEST
  
 NULL HYPOTHESIS H0:      DISTRIBUTION FITS THE DATA
 ALTERNATE HYPOTHESIS HA: DISTRIBUTION DOES NOT FIT THE DATA
 DISTRIBUTION:            GENERALIZED PARETO
    NUMBER OF OBSERVATIONS              =       33
  
 TEST:
 KOLMOGOROV-SMIRNOV TEST STATISTIC      =   0.1334534
  
    ALPHA LEVEL         CUTOFF              CONCLUSION
            10%       0.208               ACCEPT H0
             5%       0.231               ACCEPT H0
             1%       0.277               ACCEPT H0
      
We can make the following conclusions based on these plots.
  1. The estimate of the shape parameter is -0.18. Note that negative values of the shape parameter indicate that a reverse Weibull is the appropriate model while a value of 0 for the shape parameter indicates that a Gumbel distribution is the appropriate model.

    Since the estimated shape parameter is negative, but small, the next step will be to fit both a reverse Weibull and a Gumbel model to the data.

  2. The probability plot and the Kolmogorov-Smirnov goodness of fit test indicate that the generalized Pareto provides an adequate distributional model.
Gumbel Model The following shows the typical steps in developing a Gumbel distributional model.
  1. Perform an Anderson-Darling test to determine if the Gumbel provides an adequate distributional model.

  2. Perform a Gumbel maximum likelihood analysis. This analysis provides estimates for the parameters, confidence intervals for the parameters, and estimates and confidence intervals for select percentiles of the distribution.

  3. Perform a probability plot analysis. The probability plot provides estimates for location and scale. We then use the bootstrap to provide confidence intervals for the location and scale parameters and for select percentiles.
The Anderson-Darling, maximum likelihood estimation, and the probability plot are performed with the following Dataplot commands.
    .
    . Step 4: Gumbel Model
    .
    set maximum likelihood percentiles default
    anderson darling gumbel y
    gumbel mle y
    .
    title Gumbel Probability Plot
    x1label Theorerical
    y1label Data
    char x
    line blank
    gumbel probability plot y
    justification left
    move 20 85
    text Location Parameter = ^ppa0
    move 20 81
    text Scale Parameter = ^ppa1
    move 20 77
    text PPCC Value = ^maxppcc
The following graphs and output are generated.

  
               ANDERSON-DARLING 1-SAMPLE TEST
               THAT THE DATA CAME FROM AN EXTREME VALUE DISTRIBUTION
  
 1. STATISTICS:
       NUMBER OF OBSERVATIONS                =       33
       MEAN                                  =    49.21212
       STANDARD DEVIATION                    =    8.813191
       LOCATION PARAMETER                    =    45.35311
       SCALE PARAMETER                       =    6.331326
  
       ANDERSON-DARLING TEST STATISTIC VALUE =   0.6815453
       ADJUSTED TEST STATISTIC VALUE         =   0.7052736
  
 2. CRITICAL VALUES:
       90         % POINT    =   0.6370000
       95         % POINT    =   0.7570000
       97.5       % POINT    =   0.8770000
       99         % POINT    =    1.038000
  
 3. CONCLUSION (AT THE 5% LEVEL):
       THE DATA DO COME FROM AN EXTREME VALUE DISTRIBUTION.



       GUMBEL MAXIMUM LIKELIHOOD ESTIMATION:
       FULL SAMPLE, MINIMUM EXTREME VALUES CASE
  
       F(X) = (1/s)*EXP((X-U)/S)*EXP(-EXP((X-U)/S))
       U AND S DENOTE THE LOCATION AND SCALE PARAMETERS, RESPECTIVELY
  
       STANDARD ERRORS AND CONFIDENCE INTERVALS BASED ON NO BIAS CORRECTION
       SCALE PARAMETER
  
 NUMBER OF OBSERVATIONS                          =       33
 SAMPLE MINIMUM                                  =    38.00000
 SAMPLE MAXIMUM                                  =    78.00000
 SAMPLE MEAN                                     =    49.21212
 SAMPLE STANDARD DEVIATION                       =    8.813191
  
 MOMENT ESTIMATE OF LOCATION                     =    53.17859
 STANDARD ERROR OF MOMENT ESTIMATE OF LOCATION   =    1.292676
 MOMENT ESTIMATE OF SCALE                        =    6.871645
 STANDARD ERROR OF MOMENT ESTIMATE OF SCALE      =    2.589102
  
 MAXIMUM LIKELIHOOD ESTIMATE OF LOCATION         =    53.96301
 STANDARD ERROR OF LOCATION ESTIMATE             =    1.994741
 MAXIMUM LIKELIHOOD ESTIMATE OF SCALE            =    10.88288
 BIAS CORRECTED ML ESTIMATE OF SCALE             =    11.34340
 STANDARD ERROR OF SCALE ESTIMATE                =    1.477116
 STANDARD ERROR OF COVARIANCE                    =   0.9604377
  
 CONFIDENCE INTERVAL FOR SCALE PARAMETER
                        NORMAL APPROXIMATION
    CONFIDENCE           LOWER         UPPER
    VALUE (%)            LIMIT         LIMIT
 -------------------------------------------
      50.000           9.88658       11.8792
      75.000           9.18368       12.5821
      90.000           8.45324       13.3125
      95.000           7.98779       13.7780
      99.000           7.07808       14.6877
      99.900           6.02241       15.7434
  
 CONFIDENCE INTERVAL FOR LOCATION PARAMETER
                        NORMAL APPROXIMATION
    CONFIDENCE           LOWER         UPPER
    VALUE (%)            LIMIT         LIMIT
 -------------------------------------------
      50.000           52.6176       55.3084
      75.000           51.6684       56.2577
      90.000           50.6820       57.2441
      95.000           50.0534       57.8726
      99.000           48.8249       59.1011
      99.900           47.3993       60.5267
  
 CONFIDENCE LIMITS FOR SELECTED PERCENTILES
 (BASED ON NORMAL APPROXIMATION):
 ALPHA =  0.0500
                  POINT        STANDARD           LOWER           UPPER
 PERCENTILE     ESTIMATE        ERROR     CONFIDENCE LIMIT CONFIDENCE LIMIT
 --------------------------------------------------------------------------
     0.5000   35.81701       2.639861         30.64297         40.99104
     1.0000   37.34290       2.500051         32.44289         42.24290
     5.0000   42.02244       2.140419         37.82729         46.21758
    10.0000   44.88634       1.989480         40.98703         48.78565
    20.0000   48.78401       1.896091         45.06774         52.50028
    30.0000   51.94286       1.926581         48.16683         55.71889
    40.0000   54.91441       2.038860         50.91832         58.91050
    50.0000   57.95173       2.224468         53.59185         62.31161
    60.0000   61.27334       2.490531         56.39199         66.15469
    70.0000   65.18251       2.863541         59.57007         70.79495
    80.0000   70.28668       3.413945         63.59547         76.97789
    90.0000   78.45349       4.379495         69.86983         87.03715
    95.0000   86.28729       5.357913         75.78597         96.78861
    97.5000   93.97121       6.344194         81.53681         106.4056
    99.0000   104.0259       7.657493         89.01752         119.0344
    99.5000   111.5967       8.656844         94.62955         128.5638
      

Gumbel Probability Plot of the Data

The bootstrap analysis is performed with the following Dataplot commands.

    multiplot 2 2
    multiplot corner coordinates 0 0 100 100
    multiplot scale factor 1.7
    y1label Parameter Estimate
    x1label
    x2label Bootstrap Sample
    title Bootstrap Plot
    line color blue red green
    line solid all
    character blank all
    set maximum likelihood percentiles default
    bootstrap gumbel plot y
    line color black all
    .
    skip 0
    read dpst1f.dat aloc ascale appcc
    y1label
    x2label
    x3label displacement 16
    title Location Parameter
    let amed = median aloc
    let amean = mean aloc
    let asd = sd aloc
    x2label Median = ^amed, Mean = ^amean
    x3label Standard Deviation = ^asd
    histogram aloc
    title Scale Parameter
    let amed = median ascale
    let amean = mean ascale
    let asd = sd ascale
    x2label Median = ^amed, Mean = ^amean
    x3label Standard Deviation = ^asd
    histogram ascale
    title PPCC Value
    let amed = median appcc
    let amean = mean appcc
    let asd = sd appcc
    x2label Median = ^amed, Mean = ^amean
    x3label Standard Deviation = ^asd
    histogram appcc
    x3label displacement
    .
    label
    title
    .
    let alpha = 0.05
    let xqlow = alpha/2
    let xqupp = 1 - alpha/2
    .
    write " "
    write "Bootstrap-based Confidence Intervals"
    write "alpha = ^alpha"
    write " "
    .
    let xq = xqlow
    let loc95low = xq quantile aloc
    let xq = xqupp
    let loc95upp = xq quantile aloc
    let xq = xqlow
    let sca95low = xq quantile ascale
    let xq = xqupp
    let sca95upp = xq quantile ascale
    write "Confidence Interval for Location: (^loc95low,^loc95upp)"
    write "Confidence Interval for Scale: (^sca95low,^sca95upp)"
    serial read p
    0.5 1 2.5 5 10 20 30 40 50 60 70 80 90 95 97.5 99 99.5
    end of data
    let nperc = size p
    skip 1
    read matrix dpst4f.dat xqp
    write " "
    loop for k = 1 1 nperc
      let xqptemp = p(k)
      let amed = median xqp^k
      let xqpmed(k) = amed
      let xq = xqlow
      let atemp = xq quantile xqp^k
      let xq95low(k) = atemp
      let xq = xqupp
      let atemp = xq quantile xqp^k
      let xq95upp(k) = atemp
    end of loop
    set write decimals 3
    write "Bootstrap Based Confidence Intervals for Percentiles"
    variable label p Percentile
    variable label xqpmed Point Estimate
    variable label xq95low Lower Confidence Limit
    variable label xq95upp Upper Confidence Limit
    write p xqpmed xq95low xq95upp

The following graphs and output are generated.

Gumbel Bootstrap Plot of the Data

 Bootstrap-based Confidence Intervals
 alpha = 0.05
  
 Confidence Interval for Location: (38.04781,47.46314)
 Confidence Interval for Scale:    (5.046363,11.99362)
  
 Bootstrap Based Confidence Intervals for Percentiles

 VARIABLES--P              XQPMED         XQ95LOW        XQ95UPP 

          0.500         31.574         18.484         36.878
          1.000         32.693         20.124         37.621
          2.500         34.438         22.720         38.798
          5.000         35.985         25.176         39.902
         10.000         37.900         28.324         41.326
         20.000         40.747         32.631         43.716
         30.000         42.765         36.094         46.003
         40.000         44.838         39.026         48.151
         50.000         46.861         42.295         49.990
         60.000         49.119         45.861         52.223
         70.000         51.987         48.823         55.697
         80.000         55.818         51.981         60.479
         90.000         61.794         56.233         68.808
         95.000         67.525         59.952         76.792
         97.500         73.342         63.533         84.925
         99.000         80.621         68.212         95.665
         99.500         86.158         71.735        104.009
      
We can make the following conclusions based on this analysis.
  1. The Anderson-Darling test indicates that the Gumbel provides an adequate distributional model for data, although this is somewhat borderline (the Anderson-Darling accepts the Gumbel at the 5% level but rejects at the 10% level).

  2. The maximum likelihood and probability plot estimates of location and scale are somewhat different (53.96 and 10.88 compared to 45.22 and 7.18).
Reverse Weibull Model The following shows the typical steps in developing a reverse Weibull distributional model. We use the PPCC/probability plot approach to estimate the shape, location, and scale parameters. We could obtain confidence intervals for the parameters and for select quantiles using the bootstrap in a similar manner as we did for the Gumbel distribution. However, we have not included that here.

This analysis is performed with the following Dataplot commands.

    .
    . Step 5: Reverse Weibull Model
    .
    multiplot 2 2
    multiplot corner coordinates 0 0 100 100
    multiplot scale factor 1
    set minmax 2
    .
    set ipl1na rweippcc.jpg
    device 2 gd jpeg
    title Reverse Weibull PPCC Plot
    y1label PPCC Value
    x1label Value of Shape Parameter
    weibull ppcc plot y
    let gamma1 = shape - 2
    let gamma2 = shape + 2
    if gamma1 <= 0
    let gamma1 = 0.1
    end of if
    .
    weibull ppcc plot y
    .
    let gamma = shape
    title Weibull Probability Plot
    x1label Theorerical
    y1label Data
    char x
    line blank
    weibull probability plot y
    . Generate a null plot so text will go in right spot
    plot
    justification left
    hw 4 2
    move 20 85
    text Shape Parameter = ^gamma
    move 20 75
    text Location Parameter = ^ppa0
    move 20 65
    text Scale Parameter = ^ppa1
    move 20 55
    text PPCC Value = ^maxppcc
    .
    end of multiplot
    device 2 close
    .
    let ksloc = ppa0
    let ksscale = ppa1
    capture wei1.out
    weibull kolm smir goodness of fit y
    end of capture

The following graphs and output are generated.

Reverse Weibull PPCC Plot/Probability Plot of the Data

  
                   KOLMOGOROV-SMIRNOV GOODNESS-OF-FIT TEST
  
 NULL HYPOTHESIS H0:      DISTRIBUTION FITS THE DATA
 ALTERNATE HYPOTHESIS HA: DISTRIBUTION DOES NOT FIT THE DATA
 DISTRIBUTION:            WEIBULL
    NUMBER OF OBSERVATIONS              =       33
  
 TEST:
 KOLMOGOROV-SMIRNOV TEST STATISTIC      =   0.1682307
  
    ALPHA LEVEL         CUTOFF              CONCLUSION
            10%       0.208               ACCEPT H0
             5%       0.231               ACCEPT H0
             1%       0.277               ACCEPT H0
  
      
We can make the following conclusions based on these plots.
  1. The Kolmogorov-Smirnov goodness of fit test indicates the reverse Weibull provides an adequate distributional model.

  2. Using the reverse Weibull does not increase the PPCC value (in fact, it actually decreases from 0.983 to 0.980). Based on this, we might be inclined to use the simpler Gumbel model.
[Extreme Winds Home |  SED Home |  Structures Group ]

Date created: 03/05/2005
Last updated: 04/26/2023
Please email comments on this WWW page to SED_webmaster@nist.gov