20.
September 2002
This is the DATAPLOT News file DPNEWF.TEX. This NEWS file contains a
list of DATAPLOT enhancements over the last few years. This is
typically the only place that the most recent enhancements are
documented.
To get a hardcopy off-line listing of this file, exit DATAPLOT and
enter:
IBM PC: PRINT C:\DATAPLOT\DPNEWF.TEX
UNIX: lpr /usr/local/lib/dataplot/dpnewf.tex
VAX: PRINT DATAPLO$:DPNEWF.TEX (where DATAPLO$ defines the
directory where DATAPLOT auxillary files are kept)
other: Check with your local DATAPLOT installer;
at NIST: Alan Heckert (301-975-2899)
Jim Filliben (301-975-2855)
Your installation may define the directory where the DATAPLOT
auxillary files are stored differently than the list above.
-----------------------------------------------------------------------
The following enhancements were made to DATAPLOT
July 2007 - February 2008.
-----------------------------------------------------------------------
1) The following updates were made for probability
distributions.
a) Added the following new continuous distributions.
1) Burr Type 2
BU2CDF(X,R) - cdf function
BU2PDF(X,R) - pdf function
BU2PPF(P,R) - ppf function
2) Burr Type 3
BU3CDF(X,R,K) - cdf function
BU3PDF(X,R,K) - pdf function
BU3PPF(P,R,K) - ppf function
3) Burr Type 4
BU4CDF(X,R,C) - cdf function
BU4PPF(P,R,C) - ppf function
4) Burr Type 5
BU5CDF(X,R,K) - cdf function
BU5PDF(X,R,K) - pdf function
BU5PPF(P,R,K) - ppf function
5) Burr Type 6
BU6CDF(X,R,K) - cdf function
BU6PDF(X,R,K) - pdf function
BU6PPF(P,R,K) - ppf function
6) Burr Type 7
BU7CDF(X,R) - cdf function
BU7PDF(X,R) - pdf function
BU7PPF(P,R) - ppf function
7) Burr Type 8
BU8CDF(X,R) - cdf function
BU8PDF(X,R) - pdf function
BU8PPF(P,R) - ppf function
8) Burr Type 9
BU9CDF(X,R,K) - cdf function
BU9PDF(X,R,K) - pdf function
BU9PPF(P,R,K) - ppf function
9) Burr Type 10
B10CDF(X,R) - cdf function
B10PDF(X,R) - pdf function
B10PPF(P,R) - ppf function
10) Burr Type 11
B11CDF(X,R) - cdf function
B11PDF(X,R) - pdf function
B11PPF(P,R) - ppf function
11) Burr Type 12
B12CDF(X,C,K) - cdf function
B12PDF(X,C,K) - pdf function
B12PPF(P,C,K) - ppf function
12) DOUBLY PARETO UNIFORM
DPUCDF(X,M,N,ALPHA,BETA) - cdf function
DPUPDF(X,M,N,ALPHA,BETA) - pdf function
DPUPPF(P,M,N,ALPHA,BETA) - ppf function
13) KUMARASWAMY
KUMCDF(X,ALPHA,BETA) - cdf function
KUMPDF(X,ALPHA,BETA) - pdf function
KUMPPF(P,ALPHA,BETA) - ppf function
14) UNEVEN TWO-SIDED POWER
UTSCDF(X,A,B,D,NU1,NU3,ALPHA) - cdf function
UTSPDF(X,A,B,D,NU1,NU3,ALPHA) - pdf function
UTSPPF(P,A,B,D,NU1,NU3,ALPHA) - ppf function
15) SLOPE
SLOCDF(X,ALPHA) - cdf function
SLOPDF(X,ALPHA) - pdf function
SLOPPF(P,ALPHA) - ppf function
16) TWO-SIDED SLOPE
TSSCDF(X,ALPHA,THETA) - cdf function
TSSPDF(X,ALPHA,THETA) - pdf function
TSSPPF(P,ALPHA,THETA) - ppf function
17) OGIVE
OGICDF(X,N) - cdf function
OGIPDF(X,N) - pdf function
OGIPPF(P,N) - ppf function
18) TWO-SIDED OGIVE
TSOCDF(X,N,THETA) - cdf function
TSOPDF(X,N,THETA) - pdf function
TSOPPF(P,N,THETA) - ppf function
19) REFLECTED POWER FUNCTION
RPOCDF(X,C) - cdf function
RPOCHAZ(X,C) - cumulative hazard function
RPOHAZ(X,C) - hazard function
RPOPDF(X,C) - pdf function
RPOPPF(X,C) - ppf function
20) POWER FUNCTION
POWCHAZ(X,C) - cumulative hazard function
POWHAZ(X,C) - hazard function
The cdf, pdf, and ppf functions were already
available.
21) WAKEBY
WAKPDF(X,BETA,GAMMA,DELTA) - pdf function
The cdf and ppf functions were added in a
previous release.
22) Muth
MUTCDF(X,BETA) - cdf function
MUTPDF(X,BETA) - pdf function
MUTPPF(P,BETA) - ppf function
23) Logistic-Exponential
LEXCDF(X,BETA) - cdf function
LEXCHAZ(X,BETA) - cumulative hazard function
LEXHAZ(X,BETA) - hazard function
LEXPDF(X,BETA) - pdf function
LEXPPF(P,BETA) - ppf function
b) The definitions for the exponential power, alpha, and
Maxwell distributions were modified from
PEXCDF(X,ALPHA,BETA,LOC,SCALE)
PEXHAZ(X,ALPHA,BETA,LOC,SCALE)
PEXCHAZ(X,ALPHA,BETA,LOC,SCALE)
PEXPDF(X,ALPHA,BETA,LOC,SCALE)
PEXPPF(P,ALPHA,BETA,LOC,SCALE)
ALPCDF(X,ALPHA,BETA,LOC,SCALE)
ALPHAZ(X,ALPHA,BETA,LOC,SCALE)
ALPCHAZ(X,ALPHA,BETA,LOC,SCALE)
ALPPDF(X,ALPHA,BETA,LOC,SCALE)
ALPPPF(P,ALPHA,BETA,LOC,SCALE)
MAXCDF(X,SIGMA,LOC,SCALE)
MAXPDF(X,SIGMA,LOC,SCALE)
MAXPPF(P,SIGMA,LOC,SCALE)
to
PEXCDF(X,BETA,LOC,SCALE)
PEXHAZ(X,BETA,LOC,SCALE)
PEXCHAZ(X,BETA,LOC,SCALE)
PEXPDF(X,BETA,LOC,SCALE)
PEXPPF(P,BETA,LOC,SCALE)
ALPCDF(X,ALPHA,LOC,SCALE)
ALPHAZ(X,ALPHA,LOC,SCALE)
ALPCHAZ(X,ALPHA,LOC,SCALE)
ALPPDF(X,ALPHA,LOC,SCALE)
ALPPPF(P,ALPHA,LOC,SCALE)
MAXCDF(X,LOC,SCALE)
MAXPDF(X,LOC,SCALE)
MAXPPF(X,LOC,SCALE)
This reflects the fact that the ALPHA parameter for the
exponential power distribution, the BETA parameter for the
alpha distribution, and the SIGMA parameter for the Maxwell
distribution are in fact scale parameters. The random numbers,
probability plots, ppcc/ks plots, and Kolmogorov
Smirnov and chi-square gooodness of fit tests were
updated to reflect this change as well.
c) Added support for maximum likelihood estimation for
the following distributions:
Reflected generalized Topp and Leone
Burr type 10
Wakeby (actually generates L-Moments estimates)
exponential power
2) Added the following statistics:
LET A = LP LOCATION X
LET A = LP VARIANCE X
LET A = LP SD X
These statistics are supported by the following commands:
PLOT
TABULATE
CROSS TABULATE
CROSS TABULATE PLOT
LET Y = CROSS TABULATE
LET Y = MATRIX M
BOOTSTRAP PLOT
JACKNIFE PLOT
INFLUENCE CURVE
BLOCK PLOT
DEX PLOT
3) Added the following for graphics output devices.
a) Added the following device drivers
AQUA - Aquaterm for Mac OSX systems
Enter HELP AQUA for details.
b) Added the following command
SET POSTSCRIPT CONVERT CONVERT
This is a enhancement to the previously available
command SET POSTSCRIPT CONVERT. The SET POSTSCRIPT CONVERT
command uses the Ghostscript command to automatically
covert Dataplot Postscript output to one of the listed
image formats. One limitation was that the Ghostscript
command did not provide a command line switch to
generate a landscape orientation plot (which most
Dataplot graphs need). The "CONVERT CONVERT" option
uses the "convert" program in Image Magic instead of
Ghostscript. This option does support landscape
mode.
-----------------------------------------------------------------------
The following enhancements were made to DATAPLOT
March 2007 - July 2007.
-----------------------------------------------------------------------
1) We have made the following updates for categorical data
analysis.
There are two basic types of data that the following
commands address.
a) We have two variables,each with n observations, where
the first can have one of r mutually exclusive values
and the second can have one of c mutually exclusive values.
So each observation will fit into exactly one of the
r levels of variable one and exactly one of the c levels
of variable two.
Your data can be either in raw form (two columns of data
each with n rows) or summary form (an rxc table which
will typically be read into Dataplot as a matrix).
Each entry in the summary table is a count of how many
times that particular combination occurred.
b) If each variable can have exactly two outcomes (typically
coded as 1/0), then we have the 2x2 special case. There
are a number of specialized methods for dealing with
this type of data.
For this type of data, the number of observations for
the two variables need not be equal.
Some examples of this type of data are:
i) We have a diagnostic test to detect a disease.
Variable one specifies whether the patient in
fact has the disease (coded as 1) or not (coded
as 0). Variable two specifies whether the test
detected the disease (coded as 1) or not (coded
as 0).
ii) We are testing instruments to determine whether or
not they can detect a particular substance. Variable
one is the ground truth (coded as 1 when the substance
is present and coded as 0 when it is not). Variable
two denotes whether the instrument detected the
substance (1 for detected, 0 for not detected).
The following capabilities have been added to Dataplot
for analyzing these type of data.
a) The following statistical tests were added:
ODDS RATIO INDEPENDENCE TEST N11 N21 N12 N22
ODDS RATIO INDEPENDENCE TEST Y1 Y2
ODDS RATIO INDEPENDENCE TEST M
CHI-SQUARE INDEPENDENCE TEST N11 N21 N12 N22
CHI-SQUARE INDEPENDENCE TEST Y1 Y2
CHI-SQUARE INDEPENDENCE TEST M
FISHER EXACT TEST N11 N21 N12 N22
FISHER EXACT TEST Y1 Y2
FISHER EXACT TEST M
MCNEMAR TEST N11 N21 N12 N22
MCNEMAR TEST Y1 Y2
MCNEMAR TEST M
ODDS RATIO CHI-SQUARE TEST Y1 Y2
ODDS RATIO CHI-SQUARE TEST Y1 Y2 X
ODDS RATIO CHI-SQUARE TEST Y1 X1 Y2 X2
MANTEL-HAENSZEL TEST Y1 Y2
MANTEL-HAENSZEL TEST Y1 Y2 X
MANTEL-HAENSZEL TEST Y1 X1 Y2 X2
b) Added the following statistics:
LET A = ODDS RATIO X1 X2
LET A = ODDS RATIO STANDARD ERROR X1 X2
LET A = LOG ODDS RATIO X1 X2
LET A = LOG ODDS RATIO STANDARD ERROR X1 X2
LET A = RELATIVE RISK X1 X2
LET A = CRAMER CONTINGENCY COEFFICIENT X1 X2
LET A = MATRIX GRAND CRAMER CONTINGENCY COEFFICIENT M
LET A = PEARSON CONTINGENCY COEFFICIENT X1 X2
LET A = MATRIX GRAND PEARSON CONTINGENCY COEFFICIENT M
LET A = FALSE POSITIVE Y1 Y2
LET A = FALSE NEGATIVE Y1 Y2
LET A = TRUE POSITIVE Y1 Y2
LET A = TRUE NEGATIVE Y1 Y2
LET A = TEST SENSITIVITY Y1 Y2
LET A = TEST SPECIFICITY Y1 Y2
LET A = POSITIVE PREDICTIVE VALUE Y1 Y2
LET A = NEGATIVE PREDICTIVE VALUE Y1 Y2
These statistics are supported by the following commands:
PLOT
TABULATE
CROSS TABULATE
CROSS TABULATE PLOT
BOOTSTRAP PLOT
JACKNIFE PLOT
c) Added the following graphics:
ROC CURVE Y1 Y2 X - generate a ROC curve
ROSE PLOT Y - generate a rose plot (also
ROSE PLOT Y1 Y2 known as a four-fold plot)
BINARY TABULATION PLOT Y1 Y2 X1 X2
BINARY PLOT Y1 Y2 X1
where is one of:
CORRECT MATCH
FALSE POSITIVE
FALSE NEGATIVE
TRUE POSITIVE
TRUE NEGATIVE
These "binary" plots are used to generate summary
plots of "1/0" type data across groups.
ASSOCIATION PLOT M - generate an association plot
ASSOCIATION PLOT Y1 Y2
ASSOCIATION PLOT N11 N21 N12 N22
SIEVE PLOT M - generate a sieve plot
SIEVE PLOT Y1 Y2
SIEVE PLOT N11 N21 N12 N22
2) We have made the following updates for probability
distributions.
a) Maximum likelihood estimates were added for the
following distributions:
Katz (generates moment estimates)
slash
triangular
four parameter beta (generates moment estimates)
log beta
beta normal
The maximum likelihood for the two-sided power distribution
was generalized to include the lower and upper limit
parameters.
The slash and triangular distributions have also been
added to the BOOTSTRAP/JACKNIFE MLE PLOT command:
BOOTSTRAP TRIANGULAR MLE PLOT Y
JACKNIFE TRIANGULAR MLE PLOT Y
BOOTSTRAP SLASH MLE PLOT Y
JACKNIFE SLASH MLE PLOT Y
The maximum likelihood estimation for the
two-sided power distribution was updated from the
the standard case (lower and upper limits = 0 and 1)
to the general case (lower and upper limits will be
estimated from the data). Also, the ML procedure for
this distribution only applies if the N shape parameter
is > 1.
b) Added the following commands for binomial confidence
intervals:
LET A = EXACT BINOMIAL LOWER BOUND P N ALPHA
LET A = EXACT BINOMIAL UPPER BOUND P N ALPHA
LET ALOW AUPP = AGRESTI COULL LIMITS P N ALPHA
The BINOMIAL MAXIMUM LIKELIHOOD command can generate
these values for raw data. The above LET commands are
useful when you only have summary data (i.e., the p and n).
c) Added the following plots:
POISSON PLOT Y X
GEOMETRIC PLOT Y X
BINOMIAL PLOT Y X
NEGATIVE BINOMIAL PLOT Y X
LOGARITHMIC SERIES PLOT Y X
These plots are alternatives to the PROBABILITY PLOT
command.
ORD PLOT Y
This plot can help distinguish whether a Poisson,
a negative binomial, or a logarithmic series
distribution provides a more appropiate distributional
model for a set of discrete data.
3) Made the following updates to graphics commands.
a) The HISTOGRAM command now accepts a matrix argument.
b) Added the command
BIVARIATE NORMAL TOLERANCE REGION PLOT Y1 Y2 X
4) Added the following statistics:
LET P1 =
LET P2 =
LET A = TRIMMED STANDARD DEVIATION Y
5) Added the following command
SET FATAL ERROR
If an analysis or graphics command returns an error code,
this command tells Dataplot how to respond:
IGNORE - Dataplot will simply continue processing the
next command. This was the behavior before
this command was added and is the default.
TERMINATE - Dataplot will print a message and terminate
immediately.
PROMPT - Dataplot will prompt whether you want to
continue or terminate.
This command was added primarily as a debugging option.
If you are trying to debug a complex macro, it can be helpful
to have Dataplot terminate (or prompt for termination)
in order to locate where the initial error is occurring.
Note that this command is not active if you are running
the Graphical User Interface (GUI) version.
6) A Windows Vista installation is now available.
7) Fixed a number of miscellaneous bugs.
-----------------------------------------------------------------------
The following enhancements were made to DATAPLOT
May 2006 - February 2007.
-----------------------------------------------------------------------
1) The following updates were made for maximum likelihood estimates
for distributions:
a) The negative binomial was updated to distinguish between
two cases: 1) the case where k is assumed known (p is
estimated) and 2) the case where k is assumed unknown.
For case 1), confidence limits for p were added.
b) Maximum likelihood estimates were added for the
following discrete distributions:
zeta
Borel-Tanner
Lagrange-Poisson
lost games
beta-geometric
Polya-Aeppli
generalized logarithmic series
geeta
Consul
quasi binomial type I
generalized lost games
generalized negative binomial
topp and leone
c) The binomial mle was updated in the following ways:
1) For exact intervals, fixed a bug for extreme values
of p and small samples.
2) By default, Dataplot switches from the exact method
to the normal approximation for sample sizes greater
than 30 (Agresti-Coull intervals are always generated).
You can specify the threshold with the command
SET BINOMIAL NORMAL APPROXIMATION THRESHOLD
3) Some analysts prefer to use a continuity correction
(p + 0.5)/(n + 1)
You can specify whether to use the continuity
correction by entering the command
SET BINOMIAL CONTINUITY CORRECTION
The default is OFF.
2) The following distributional updates were made.
a) The YULCDF was updated to use an explicit formula (as
oppossed to direct summation).
b) For the KS PLOT, the location and scale parameters are
estimated via the probability plot. For long-tailed
distributions, more accurate estimates may be obtained
by applying a biweight fit of the probability plot.
To specify this option, enter the command
SET PPCC PLOT LOCATION SCALE BIWEIGHT
To restore the use of the regular least squares
estimates of location and scale, enter
SET PPCC PLOT LOCATION SCALE DEFAULT
c) Added the following new continuous distributions.
1) Asymmetric Log-Laplace
ALDCDF(X,ALPHA,BETA) - cdf function
ALDPDF(X,ALPHA,BETA) - pdf function
ALDPPF(P,ALPHA,BETA) - ppf function
2) Log-Beta
LBECDF(X,ALPHA,BETA,C,D) - cdf function
LBEPDF(X,ALPHA,BETA,C,D) - pdf function
LBEPPF(P,ALPHA,BETA,C,D) - ppf function
3) Topp and Leone
TOPCDF(X,BETA) - cdf function
TOPPDF(X,BETA) - pdf function
TOPPPF(P,BETA) - ppf function
4) Generalized Topp and Leone
GTLCDF(X,ALPHA,BETA) - cdf function
GTLPDF(X,ALPHA,BETA) - pdf function
GTLPPF(P,ALPHA,BETA) - ppf function
5) Reflected Generalized Topp and Leone
RGTCDF(X,ALPHA,BETA) - cdf function
RGTPDF(X,ALPHA,BETA) - pdf function
RGTPPF(P,ALPHA,BETA) - ppf function
6) Wakeby:
WAKCDF(X,BETA,GAMMA,DELTA) - cdf function
WAKPPF(P,BETA,GAMMA,DELTA) - ppf function
d) Added the following new discrete distributions.
1) Beta-Geometric (Waring)
BGECDF(X,ALPHA,BETA) - cdf function
BGEPDF(X,ALPHA,BETA) - pdf function
BGEPPF(X,ALPHA,BETA) - ppf function
2) Beta-Negative Binomial (generalized Waring)
BNBCDF(X,ALPHA,BETA,k) - cdf function
BNBPDF(X,ALPHA,BETA,k) - pdf function
BNBPPF(X,ALPHA,BETA,k) - ppf function
3) Zeta
ZETCDF(X,ALPHA) - cdf function
ZETPDF(X,ALPHA) - pdf function
ZETPPF(X,ALPHA) - ppf function
4) Zipf
ZIPCDF(X,ALPHA,N) - cdf function
ZIPPDF(X,ALPHA,N) - pdf function
ZIPPPF(X,ALPHA,N) - ppf function
5) Borel-Tanner
BTACDF(X,LAMBDA,N) - cdf function
BTAPDF(X,LAMBDA,N) - pdf function
BTAPPF(X,LAMBDA,N) - ppf function
6) Lagrange-Poisson
LPOCDF(X,LAMBDA,THETA) - cdf function
LPOPDF(X,LAMBDA,THETA) - pdf function
LPOPPF(X,LAMBDA,THETA) - ppf function
7) Leads in Coin Tossing (Discrete Arcsine)
LCTCDF(X,N) - cdf function
LCTPDF(X,N) - pdf function
LCTPPF(X,N) - ppf function
8) Classical Matching
MATCDF(X,K) - cdf function
MATPDF(X,K) - pdf function
MATPPF(X,K) - ppf function
9) Polya-Aeppli
PAPCDF(X,THETA,P) - cdf function
PAPPDF(X,THETA,P) - pdf function
PAPPPF(X,THETA,P) - ppf function
10) Generalized Logarithmic Series
GLSCDF(X,THETA,BETA) - cdf function
GLSPDF(X,THETA,BETA) - pdf function
GLSPPF(X,THETA,BETA) - ppf function
11) Geeta
GETCDF(X,THETA,BETA) - cdf function
GETPDF(X,THETA,BETA) - pdf function
GETPPF(X,THETA,BETA) - ppf function
This distribution can also be parameterized with
MU and BETA.
12) Quasi Binomial Type 1
QBICDF(X,P,PHI) - cdf function
QBIPDF(X,P,PHI) - pdf function
QBIPPF(X,P,PHI) - ppf function
13) Generalized Negative Binomial
GNBCDF(X,THETA,BETA,M) - cdf function
GNBPDF(X,THETA,BETA,M) - pdf function
GNBPPF(X,THETA,BETA,M) - ppf function
14) Truncated Generalized Negative Binomial
GNTCDF(X,THETA,BETA,M,N) - cdf function
GNTPDF(X,THETA,BETA,M,N) - pdf function
GNTPPF(X,THETA,BETA,M,N) - ppf function
15) Discrete Weibull
DIWCDF(X,Q,BETA) - cdf function
DIWPDF(X,Q,BETA) - pdf function
DIWPPF(X,Q,BETA) - ppf function
DIWHAZ(X,Q,BETA) - hazard function
16) Consul (a generalized geometric)
CONCDF(X,THETA,M) - cdf function
CONPDF(X,THETA,M) - pdf function
CONPPF(X,THETA,M) - ppf function
17) Lost Games
LOSCDF(X,P,R) - cdf function
LOSPDF(X,P,R) - pdf function
LOSPPF(X,P,R) - ppf function
18) Generalized Lost Games
GLGCDF(X,P,J,A) - cdf function
GLGPDF(X,P,J,A) - pdf function
GLGPPF(X,P,J,A) - ppf function
19) Katz
KATCDF(X,ALPHA,BETA) - cdf function
KATPDF(X,ALPHA,BETA) - pdf function
KATPPF(X,ALPHA,BETA) - ppf function
e) The Waring routines (WARCDF, WARPDF, WARPPF) routines
were re-written to take advantage of their relationship
to the beta-geometric (the Waring is simply a different
parameterization of the beta-geometric). This makes
the Waring routines more computationally efficient and
more accurate.
3) Added the following LET sub-commands.
a) Added the harmonic number and generalized harmonic
number functions:
LET A = HARMNUMB(N)
LET A = HARMNUMB(N,M)
b) For certain types of plots, it can be useful to add a
small bit of random noise to a variable to avoid
overplotting. This is commonly referred to as jittering.
To simplify this, the following command was added:
LET DELTA
LET Y = JITTER X DELTA
The value of DELTA is used to control the magnitude of
the jittering. That is, the value of x(i) will be
changed to a value x(i) + noise where noise is in the
range (-DELTA/2,DELTA/2).
4) Made the following updates to the CONSENSUS MEANS command.
a) If a within-lab standard deviation is zero (i.e., the lab
has only a single unique measurement value), that lab
will be omitted from the analysis (it will be included
in the initial summary table). Previously, Dataplot
treated this as an error and would not run the
consensus means analysis.
b) Added the Fairweather method. There are 3 separate
methods for generating 95% confidence intervals for this
method (the original method proposed by Fairweather,
an improvement suggested by Cox, and a method developed
by Ruhkin). The output for this method is only printed
if the minimum number of oberservations for a lab is
greater than 5.
c) Added the Bayesian Consensus Procedure (BCP) method of
Hagwood and Guthrie. This is a refinement of the BOB
method. For this method, the consensus mean and the
standard deviation of the consensus mean are asymptotically
equivalent to the posterior mean and standard deviation of
a fully Bayesian method.
d) Dataplot currently supports 12 methods. Most users will
only be interested in a subset of these methods. You
can now selectively turn individual methods on or off
(all methods are on by default) with the commands:
SET MANDEL PAULE
SET MODIFIED MANDEL PAULE
SET VANGEL RUHKIN
SET BOB
SET SCHILLER EBERHARDT
SET MEAN OF MEANS
SET GRAND MEAN
SET GRAYBILL DEAL
SET GENERALIZED CONFIDENCE INTERVAL
SET DERSIMONIAN LAIRD
SET FAIRWEATHER
SET BAYESIAN CONSENSUS PROCEDURE
5) The following updates and enhancements were made to
the graphics commands.
a) Added the command:
SET 4-PLOT DISTRIBUTION
The 4-plot by default consists of a run sequence plot,
a lag plot, a histogram, and a normal probability plot.
The above command allows us to replace the normal
probability plot with an exponential probability plot.
This is useful when checking the assumptions for a
Homogeneous Poisson Process (HPP) where we assume the
interarrival times follow an exponential distribution.
b) Added the command:
REPAIR PLOT Y X CENSOR
This is used to plot repair data where we may have
multiple systems and each system may have a single
censoring time (i.e., the time between the last repair
and the end of the test). Enter HELP REPAIR PLOT
for details.
c) Added the command:
MEAN REPAIR FUNCTION PLOT Y X CENSOR
d) Added the command
TRILINEAR PLOT Y1 Y2 Y3
This is used for plots where the rows of Y1, Y2, and
Y3 are mixtures (i.e., they sum to either 1 (or 100
if you are using fractional units)).
6) Updated the RELIABILITY TREND TEST in the following
ways.
a) Fixed a bug in the reverse arrangements test.
b) Modified the output format for better clarity.
c) Added support for multiple systems. For multiple systems,
the tests will be applied to each individual system and
then composite tests will be performed.
d) Added support for HTML, Latex, and RTF format.
7) The following bug fixes were made:
a) The 2 variable case for the chi-square goodness of fit
test for discrete distributions had a bug. This has
been fixed. For older versions, a work around is
SET MINSIZE = 1
LET Y3 XLOW XHIGH = COMBINE FREQUENCY TABLE Y2 X2
POISSON CHI-SQUARE GOODNESS OF FIT Y3 XLOW XHIGH
b) Some bugs with LET subcommands and SUBSETTING were
corrected.
c) A bug involving IF statements within nested loops was
corrected.
d) A few other miscellanous bug fixes were made.
-----------------------------------------------------------------------
The following enhancements were made to DATAPLOT
September 2005 - April 2006.
-----------------------------------------------------------------------
1) For many one-factor plots, it is useful to sort the horizontal
axis based on the value of some statistic (most commonly a
location statistic such as the mean, median, minimum, or
maximum). The following commands was added to help generate
these sorted plots:
LET XSORT INDX = SORT BY X GROUPID
For example, to generate a sorted mean plot for variables
Y and X, you would do something like
LET X2 INDX = SORT BY MEAN Y X
X1TIC MARK LABEL FORMAT VARIABLE
X1TIC MARK LABEL CONTENT INDX
MEAN PLOT Y X2
This can be used with the following types of plots
i) PLOT Y X
where is a desired statistic (e.g., MEAN or
SD).
ii) BOX PLOT Y X
iii) PLOT Y X GROUP
For details, enter HELP SORT BY STATISTIC.
These plots often have alphabetic tick mark labels. The
following enhancements were made to simplify the use
of alphabetic tick mark labels with sorted plots.
a) The TIC MARK LABEL FORMAT and TIC MARK LABEL CONTENT
commands were previously augmented to allow numeric
variables, group label variables, or the row label
variable as the contents for the tick mark labels.
Specifically,
LET LAB = DATA 50 40 30 20 10 0
X1TIC MARK LABEL FORMAT VARIABLE
X1TIC MARK LABEL CONTENT LAB
LET IG = GROUP LABELS A B C D E
X1TIC MARK LABEL FORMAT GROUP LABEL
X1TIC MARK LABEL CONTENT IG
X1TIC MARK LABEL FROMAT ROW LABELS
This has been enhanced to allow an index variable to
be specified on the above TIC MARK LABEL CONTENT
commands (the index variable is typically generated by
a SORT BY command). The index variable specifies
the order in which the tic mark labels will be generated.
So the above examples can be augmented by
LET X2 INDX = SORT BY MEAN Y X
LET LAB = DATA 50 40 30 20 10 0
X1TIC MARK LABEL FORMAT VARIABLE
X1TIC MARK LABEL CONTENT LAB INDX
LET X2 INDX = SORT BY MEAN Y X
LET IG = GROUP LABELS A B C D E
X1TIC MARK LABEL FORMAT GROUP LABEL
X1TIC MARK LABEL CONTENT IG INDX
LET X2 INDX = SORT BY MEAN Y X
X1TIC MARK LABEL FROMAT ROW LABELS
X1TIC MARK LABEL CONTENT INDX
b) The LET ... = GROUP LABEL .... command was augmented in
the following two ways.
i) You can specify literal strings for group labels.
For example,
LET IG = GROUP LABEL BATCHSP()1 BATCHSP()2 ...
BATCHSP()3 BATCHSP()4
The strings are separated by spaces. If you need to
include a space in a particular string, use the
SP() as in the above example.
ii) Pre-defined strings can be used to define a group
label variable. For example,
LET IG = GROUP LABEL ST1 TO ST10
where ST1, ST2, ...., ST10 are previously defined
strings. The TO syntax is useful in this context
when the number of strings is large.
Dataplot's algorithm for parsing the GROUP LABEL command
is:
i) Dataplot first checks the character variables file
(HELP SET CONVERT CHARACTER for details). If the
first name listed is found, Dataplot uses this
character variable to define the group labels.
ii) If a character variable is not found, Dataplot
checks all the listed names to see if they are
previously defined strings. If they are, then
Dataplot substitutes the values of these strings.
iii) If one or more of the names is not a previously
defined string, then Dataplot treats all of the
names as literal text strings.
2) You can now pass arguments to macros.
To pass arguments to a macro, do something like
CALL SAMPLE.DP arg1 arg2 arg3
Up tp 10 arguments may be passed (although limits on command
line lengths still apply). Arguments containing spaces or
hyphens should be enclosed in quotes. The character limit for
a single argument is 40 characters.
In the SAMPLE.DP macro, if a $1 is encountered, it will be
replaced with "arg1", if a $2 is encountered, it will be
replaced with "arg2" and so on. A $0 will substitute the
number of arguments given on the CALL command.
This substitution will only occur if a command line is contained
within a macro (i.e., if no macro is active, the "$" will not
signal any substitution and it will remain in the command line
as given).
Dataplot currently only supports one level of argument
substitition for macros. That is, the values of the macro
arguments (i.e., the $1, $2, etc.) will contain the values
given by the most recent CALL command that specified at least
one argument. If you need to nest CALL commands with macro
arguments, the recommended work around is to have the
higher level macro extract any macro arguments passed to it
into temporary variables or strings before calling any other
macros. For example, supposse SAMPLE.DP needs to call
SAMPLE2.DP with arguments. You could do something like
the following in SAMPLE.DP:
. Start of SAMPLE.DP macro
let string zzzzs1 = $1
let string zzzzs2 = $2
let string zzzzs3 = $3
...
call sample2.dp newarg1 newarg2
The default character for argument substitution is the
"$". To use a different character, enter the command
MACRO SUBSTITUTION CHARACTER
3) The following enhancements were made to the CAPTURE
command (the CAPTURE command re-directs alphanumeric output
to a file rather than displaying it on the screen).
a) Sometimes it may be useful to have the output sent to
both the screen and to a file. You can do this by
entering the command
CAPTURE SCREEN ON
To restore CAPTURE output only being sent to the
CAPTURE file, enter the command
CAPTURE SCREEN OFF
b) Sometimes it may be useful to selectively send output to
the CAPTURE file. You can do this with the following
commands:
CAPTURE SUSPEND
CAPTURE RESUME
where SUSPEND specifies that output will be sent to the
screen rather than the CAPTURE file (note that the CAPTURE
file remains open) and RESUME will send the output to
the currently open CAPTURE file. You can enter as many
CAPTURE SUSPEND/CAPTURE RESUME sequences as you like
between a CAPTURE/END OF CAPTURE session.
Note that OFF is a synonym for SUSPEND and ON is a
synonym for RESUME.
4) Made the following probability distribution updates:
a) Added confidence intervals for the maximum likelihood
estimates for the geometric distribution.
b) Added confidence intervals for the maximum likelihood
estimates for the Poisson distribution.
c) Added support for the following new probability
distributions:
1) Added the type 2 generalized logistic distribution.
Enter HELP GL2PDF for details.
2) Added the type 3 generalized logistic distribution.
Enter HELP GL3PDF for details.
3) Added the type 4 generalized logistic distribution.
Enter HELP GL4PDF for details.
4) Added the Hosking parameterization of the generalized
logistic distribution. Enter HELP GL5PDF for details.
5) Added the generalzied Tukey-Lambda distribution. Enter
HELP GLDPDF for details.
6) Added the beta-normal distribution. Enter HELP BNOPDF
for details.
7) Added the asymmetric log double exponential (Laplace)
distribution. Enter HELP ALDPDF for details.
5) Added or modified the following analysis comamnds.
a) The Durbin test for identifical effects in a two-way
table for balanced incomplete block designs is supported
with the command
DURBIN TEST Y BLOCK TREATMENT
Enter
HELP DURBIN TEST
for details.
b) The TOLERANCE LIMITS command generates both normal tolerance
limits and non-parametric tolerance limits. You can now
specify only one of these with the commands
NORMAL TOLERANCE LIMITS
NONPARAMETRIC TOLERANCE LIMITS
c) The GRUBS TEST for outlier detection was previously augmented
to generate three distinct tests:
i) a test for both the minimum and maximum points as
outliers.
ii) a test for the minimum points as an outliers.
iii) a test for the maximum points as an outliers.
This has now been modifed into three distinct commands:
GRUBBS TEST Y
GRUBBS MINIMUM TEST Y
GRUBBS MAXIMUM TEST Y
This was done so that the internally saved parameters
(e.g., STATVAL, STATCDF, etc.) will now be correct for
the appropriate test.
d) The CONSENSUS MEANS command was modified in a number of
ways. Specifically,
1) The output format was modified to make it more
consistent and to provide better clarity. In
particular, a clearer distinction is made between
standard uncertainty (the standard error of the
consensus mean), expanded uncertainty (2*standard
error) and expanded uncertainty based on a
normal or t percent point value.
2) Modified the summary tables. There are now 4 summary
tables generated:
i) A summary table of the original data.
ii) A summary table of the 95% confidence limits
generated by each method
iii) A summary table of the standard uncertainties
generated by each method (i.e., the standard
error of the consensus mean estimate)
iv) A summary table of the expanded uncertainties
generated by each method (i.e., the 2 times
the standard error of the consensus mean estimate)
3) Added the following new methods:
i) The Graybill-Deal method now generates confidence
limits using a method proposed by Andrew Rukhin.
It also generates 4 distinct estimates of the
variance of the consensus mean (the Sinha method,
the naive method, and 2 methods proposed by
Nien-Fan Zhang. The commonly used naive method
is know to seriously underestimate the variance
for small sample sizes.
ii) Added the generalized confidence interval method
proposed by Hari Iyer and Jack Wang.
iii) Added the DerSimonian-Laird method.
4) Previous versions of Dataplot allowed you to create
the CONSENSUS MEANS output in HTML format
(CAPTURE HTML FILE.HTM) or Latex format
(CAPTURE LATEX file.tex). This was extended to
include Rich Text Format (RTF). The RTF option
is used for creating output that can be read into
Microsoft Word (RTF is a protocol Microsoft created
for transporting word processing files between
different word processing programs). For example
CAPTURE RTF FILE.RTF
CONSENSUS MEAN Y X
END OF CAPTURE
You can then import FILE.RTF into Word. Note that
although RTF is suppossed to be a portable format,
our experience is that non-Word word processors do a
poor job of importing the Dataplot RTF files (tables
tend to be problamatic for non-Word software and
Dataplot is creating most of its RTF output as tables).
6) The following updates were made to graphics output devices.
a) The GD library, used to generate JPEG and PNG format
graphs, was updated from version 1.84 to 2.033. The
primary consequence of this is that we can now generate
GIF format files as well. To generate GIF files, enter
SET IPL1NA PLOT.GIF
DEVICE 2 GD GIF
b) Dataplot can now generate graphs in Latex format.
The primary motivation for using this format is
to generate publication quaility graphs. There are
some unique features to this device driver that are
described in detail in the HELP LATEX command.
7) The following statistic command was added.
LET A = RATIO Y1 Y2
This statistic is the sum of Y1 divided by the sum of Y2.
The following additional commands are supported:
TABULATE RATIO Y1 Y2 X
CROSS TABULATE RATIO Y1 Y2 X1 X2
RATIO PLOT Y1 Y2 X
RATIO CROSS TABULATE PLOT Y1 Y2 X1 X2
BOOTSTRAP RATIO PLOT Y1 Y2
JACKNIFE RATIO PLOT Y1 Y2
8) The following special function library functions were added:
I0INT - integral of the modified Bessel function of the
first kind and order 0
J0INT - integral of the Bessel function of the first kind
and order 0
K0INT - integral of the modified Bessel function of the
third kind and order 0
Y0INT - integral of the Bessel function of the second kind
and order 0
I0ML0 - difference of the modified Bessel function of the
first kind of order 0 and the modified Struve function
of order 0
I1ML1 - difference of the modified Bessel function of the first
kind of order 1 and the modified Struve function of
order 1
AIRINT - integral of the Airy function Ai
BIRINT - integral of the Airy function Bi
AIRYGI - modified Airy function Gi
AIRYHI - modified Airy function Hi
ATNINT - integral of the inverse-tangent function
9) Added the following LET subcommands:
a) LET Y2 = REPLACE GROUPID GROUP2 Y1
This command does the following:
1) It matches the values in GROUP2 against GROUPID and
returns the indices of the matching rows for the GROUPID
array.
2) The indices are used to access the corresponding value
in the Y1 array.
3) The corresponding row of Y2 is replaced with the Y1
value.
The abbreviated syntax
LET Y2 = REPLACE GROUPID GROUP
simply assigns a value of 1 in the corresponding row of Y2.
Enter HELP REPLACE for details.
b) LET Y2 X2 = MATRIX BIN M
This command is used to generate a frequency table for
the elements in a matrix. This can be used to generate
a histogram of the elements in a matrix. For example,
LET Y2 X2 = MATRIX BIN M
HISTOGRAM Y2 X2
Enter HELP MATRIX BIN for details.
c) LET M = MATRIX TRUNCATION M IVALUE
LET M = MATRIX LOWER TRUNCATION M IVALUE
Set all values in the matrix M that are less than
IVALUE to IVALUE. This command can be used in conjunction
with the MATRIX SUBTRACT command to remove background
values from a matrix. For example, if the background
value is 5, do something like
LET IBACK = 5
LET IZERO = 0
LET M = MATRIX SUBTRACT M IBACK
LET M = MATRIX TRUNCATION M IZERO
Likewise, you can use the following command to perform
an upper truncation:
LET M = MATRIX LOWER TRUNCATION M IVALUE
That is, any values in M greater than IVALUE are set to
IVALUE.
10) The SET HISTOGRAM CLASS WIDTH was previously implemented to
specify different default class width algorithms for
histograms. This command was extended to apply to the
following additional commands:
LET Y2 X2 = BINNED Y
LET Y2 X2 = MATRIX BIN Y
NORMAL MIXTURE MAXIMUM LIKELIHOOD Y
CHI-SQUARE GOODNESS OF FIT Y
2 SAMPLE CHI-SQUARE GOODNESS OF FIT Y
11) Added the following command
PROCESS ID
This command will print the process id and save this
process id in the internal parameter PID.
12) Made the following bug fixes.
a) Previously, if all elements of a response variable were
equal, the HISTOGRAM command would print an error message
and not generate the histogram. Dataplot will now
print a warning message, but will generate a histogram
with one non-zero class (it will generate one class above
and one class below with zero count as well).
b) In the TABULATE command, if all elements in the response
variable are identifical, change from an error message to a
warning message and perform the tabulation anyway.
c) Corrected a bug in Friedman's test. The previous version
is correct if the original data is the rank within a block.
The corrected version does not require that the data
already be ranked.
d) The WILK SHAPIRO command was not returning the p-value in
the saved parameter PVALUE correctly. This was corrected.
e) For the command
LET Z2 = BIVARIATE INTERPOLATION Z Y X Y2 X2
the Y and X arguments were in the wrong order (i.e., the
command was interperting Y X as X Y). This was corrected.
f) Fixed bugs in the
LET X = CHARACTER CODE IX1
LET X = ALPHABETIC CHARACTER CODE IX1
commands.
g) The command
LET Y2 XLOW XUPP = COMBINE FREQUENCY TABLE Y X
is used to combine low frequency bins. The original
implementation simply worked from left to right to
combine the bins. Since low frequency bins typically
occur in the left and right tails, the algorithm was
modified to move from the left tail to the center and
then from the right tail to the center.
h) Fixed a bug where the ORIENTATION command could cause
Dataplot to hang on subsequent plots if no DEVICE 2
command was defined and a software font was used to
draw text.
i) Dataplot creates and uses a number of temporary files
in the current directory.
If you have multiple sessions running from the current
directory, this can create a problem for these temporary
files. In most cases, a conflict does not occur because
Dataplot will open the file, read or write to the file,
and then close the file immediately. However, a few
files, such as the plot files dppl1f.dat and dppl2f.dat,
typically remain open. The effect of different Dataplot
sessions trying to access these files is system dependent.
1. On Unix and Windows 98/NT4 platforms, the file will
contain whatever was most recently written to it.
2. On Windows 2000/XP platforms, the Dataplot session
that opens the file first has a "lock" on the file.
This causes any subsequent Dataplot session that tries
to access the file to hang.
This is particularly a problem with the GUI version
on Windows 2000/XP. Specifically, if the Dataplot GUI
does not shut down cleanly, the underlying Dataplot
executable does not get killed. This then causes any
future attempt to open the GUI to hang since the "dead"
Dataplot executable has a lock on the file. You have to
use "Cntrl-Alt-Del" to bring up the Task Manager, select
"Processes", and then manually kill any "DPLAHEY.EXE"
processes in order to clear the dead process.
In particualar, if you close the GUI by clicking the
"x" in the upper right hand corner (rather than clicking
the EXIT menu), this does not kill the underlying
DPLAHEY.EXE process.
As a partial solution to this problem, Dataplot should
now trap this condition. It will print a message
indicating how to clear the "dead" DPLAHEY.EXE process.
In addition, it will do one of two things in the current
Dataplot process:
a. It will attach the process id to the temporary file
name and then re-open the file.
b. It will simply ignore file (so if dppl2f.dat is locked,
Dataplot will not write the current plot to dppl2f.dat
in the current Dataplot session).
You can specify which option Dataplot will use by entering
one of the following commands in your startup file
(c:\Program Files\NIST\DATAPLOT\DPLOGF.TEX):
SET TEMPORARY FILE PID
SET TEMPORARY FILE IGNORE
The default is PID.
-----------------------------------------------------------------------
The following enhancements were made to DATAPLOT June - August 2005.
-----------------------------------------------------------------------
1) The following matrix commands were added.
a. The sum of all elements in a matrix can be computed with
the following command
LET A = MATRIX SUM M
b. Previous versions of Dataplot allowed you to compute
various column or row statistics
(HELP MATRIX COLUMN STATISTIC or HELP MATRIX ROW STATISTIC
for details). This capability has been extended to the
case of computing the statistics for the entire matrix
with the command
LET A = MATRIX GRAND M
where denotes the desired the statistic (the list
of supported statistics is the same as for the
MATRIX COLUMN STATISTIC and MATRIX ROW STATISTIC commands.
c. Previous versions of Dataplot allowed you to compute
various column or row statistics
(HELP MATRIX COLUMN STATISTIC or HELP MATRIX ROW STATISTIC
for details). This capability has been extended to the
case where the matrix is divided into equal partitions
with the command
LET MOUT = MATRIX PARTITION M NROW NCOL
with M, NROW, and NCOL denoting the input matrix, the number
of rows in each sub-matrix, and the number of columns in
each sub-matrix, respectively. Note that this command
returns a matrix (MOUT) of values.
That is, the original matrix is divided into sub-matrices
containing NROW rows and NCOL columns each. The partition
starts at row 1 and column 1. The number of rows in MOUT
is determined by dividing the number of rows in M by NROW.
Likewise, the number of columns is determined by dividing
the number of columns in M by NCOL. If this division
does not result in an integer value (e.g., 23 columns
in M and NCOL = 5 results in 3 columns left over), then the
last column, or row, of MOUT will be based on whatever
columns are left over.
In addition, the MATRIX PARTITION command has been extended
to accomodate unequal partitions where the partitions need
not be contiguous.
The syntax in this case is
LET MOUT = MATRIX PARTITION M TAGROW TAGCOL
with M denoting the input matrix. In this case, TAGROW and
TAGCOL are vectors with TAGROW having the same number of rows
as M and TAGCOL having the same number of columns as M.
The elements of TAGROW and TAGCOL identify which partition
each element of M belongs to. The output matrix will be
dimensioned based on the number of distinct values in
TAGROW and TAGCOL.
2) The following commands were added to compute probability
weighted moments and L-moments.
LET P = PROBABILITY WEIGHTED MOMENTS Y
LET L = L MOMENTS Y
3) The following distributional updates were made.
a. Made the following enhancements to the generalized Pareto
maximum likelihood command.
1. L-moment and elemental percentile estimates are now
included. The L-moment estimators are a refinement of
probability weighted moments. The elemental perecentile
method is described in Castillo, Hadi, Balakrishnan, and
Sarabia, "Extreme Value and Related Models with
Applications in Engineering and Science", Wiley, 2005.
One advantage of the elemental percentile approach is that
it does not have the restricted domain for the shape
parameter that the moment and maximum likelihood estimators
have.
2. The elemental percentile estimate is now used as the
starting value for the maximum likelihood. This seems
to improve the convergence of the ML method.
3. The methods used (moments, L-moments, elemental percentiles,
and maximum likelihood) do not estimate a location
parameter.
By default, these methods will now use the minimum data
value (minus an epsilon fudge factor) as the estimate of
location. The data will subtract this value before
applying the estimation procedures.
If you would like to provide your own location estimate,
enter the command
LET THRESHOL =
Any data values less than the value specified for
THRESHOL will be omitted from the estimation. Note that
the generalized Pareto is often used in the context of
modeling the distribution of "points above a threshold",
so specifying a threshold greater than some of the data
points is fairly common.
4. The maximum likelihood estimates now include the normal
approximation confidence intervals for the scale and
shape parameters and, optionally, for select percentiles
of the data.
To specify percentile estimates, enter the command
SET MAXIMUM LIKELIHOOD PERCENTILES
where specifies the name of a variable containing
the desired percentiles. You can specify DEFAULT to
to use a default set of values.
Be aware that for the generalized Pareto maximum
likelihood estimation, a relatively large sample size
may be required for the asymptotic normal approximations
to become reasonably accurate. Some studies have
indicated sample sizes of at least 500 may be required.
b. Added support for the maximum likelihood estimation for
the inverted Weibull distribution:
INVERTED WEIBULL MLE Y
INVERTED WEIBULL MLE Y X
The first syntax supports the full sample case. It will
return confidence intervals for the shape and scale
parameters for various values of alpha (based on the
normal approximations) and will return confidence intervals
for selected percentiles if you have entered a
SET MAXIMUM LIKELIHOOD PERCENTILES DEFAULT command.
The second syntax supports the censored case. This case
currently only returns point estimates.
c. The BINOMIAL MLE now returns improved confidence intervals.
d. We have modified the output from a number of the maximum
likelihood commands to make the output more consistent.
3) Made a number of bug fixes. In particular
a. Fixed a bug where the following orm of the DERIVAIVE command
wasn't being recognized:
LET FUNCTION D = DERIVATIVE F WRT X
This syntax should now work.
b. Fixed the DIFFERENCE OF MEANS CONFIDENCE INTERVALS command
(in adding support for the HTML/LATEX output, we had shut
off the standard ASCII output). Fixed the HTML outout
for this command.
-----------------------------------------------------------------------
The following enhancements were made to DATAPLOT January - May 2005.
-----------------------------------------------------------------------
1) Distributional Modeling Updates
a. Dataplot provides extensive distributional modeling
capabilities via probability plots and PPCC/KS plots. One
limitation of these methods is that they do not provide
estimates for the uncertainty of the parameter estimates
and for the distribution quantiles.
The BOOTSTRAP ... PLOT command was enhanced to support
distributional modeling for a number of distributions.
This can be used to obtain confidence intervals for the
distribution parameters, for selected percentiles of the
distribution, and for the value of the PPCC (or K-S
statistic).
For details, enter
HELP DISTRIBUTIONAL BOOTSTRAP
b. For the case of one shape parameter, the PPCC plot was
enhanced to support a group option (where group means
multiple batches of data as oppossed to binned data).
In this case, a separate curve is drawn for each batch
of the data. This can be used to check for a common
shape parameter across multiple batches of data. For
details, enter
HELP PPCC PLOT
c. The PPCC PLOT and PROBABILITY PLOT commands support binned
data. Previously, the binning consisted of two variables:
the first contained the bin frequencies and the second
contaned the mid-point of the bins. This form assumes
the bins are of equal width.
Some binned data may contain bins of unequal width. The
most common reason for the this is to combine bins in the
tails which have low frequencies.
The PPCC PLOT and PROBABILITY PLOT commands were updated
to handle this case. In this case, the syntax is
PPCC PLOT Y XLOW XHIGH
PROBABILITY PLOT Y XLOW XHIGH
with Y, XLOW, and XHIGH denoting the frequency variable,
the lower class boundary, and the upper class boundary,
respectively. For details, enter
HELP PPCC PLOT
HELP PROBABILITY PLOT
d. The following enhancenets were made to the maximum
likelihood estimation.
1. Added confidence intervals for the location and scale
parameters for the double exponential case
(DOUBLE EXPONENTIAL MAXIMUM LIKELIHOOD Y).
2. Added a weighted order statistics method to the Cauchy
maximum likelihood estimation (CAUCHY MLE Y). This method
was added because it is the method recommended for the
Cauchy Anderson-Darling test (see D'Agostino and Stephens,
"Goodness-Of-Fit Techniques", Marcel Dekker, 1986, p. 164).
3. Added support for the maximum case of the 2-parameter
extreme value type 2 (Frechet) distribution. This includes
confidence intervals for the estimated parameters and
for select percentiles (see
SET MAXIMUM LIKELIHOOD PERCENTILES).
e. The Anderson-Darling test now supports the extreme value
type 2 (Frechet) for the maximum case and the Cauchy
distribution.
f. Added support for the minimum case for the generalized
extreme value distribution. Added the GEVHAZ and GEVCHAZ
functions to compute the hazard and cumulative hazard
functions for the generalized extreme value distribution.
g. A number of distributions (Weibull, Gumbel, Frechet,
and generalized extreme value) support both a minimum and
a maximum case. The command
SET MINMAX <1/2>
is used to specify which case (1 = minimum, 2 = maximum).
If no MINMAX command is entered, previous versions used
the value 1 as the default (this was chosen since the
minimum case is what is typically used for the Weibull
distribution).
However, for the other distributions, the maximum case
is generally the one most used. For this reason, we
added the value 0 to indicate the default where the default
is now specific to each distribution. For the Weibull, the
default is the minimum and for the Gumbel, Frechet, and
generalized extreme value the default is the maximum.
2) Interlaborartory Analysis Updates
Dataplot added the following commands to perform an
interlaboratory analysis as documented in
"Standard Practice for Conducting an Interlaboratory Study
to Determine the Precision of a Test Method", ASTM
International, 100 Barr Harbor Drive, PO BOX C700,
West Conshohoceken, PA 19428-2959, USA. This document is
in support of ASTM Standard E 691 - 99.
The specific commands added are:
LET A = REPEATABILITY STANDARD DEVIATION Y LABID
LET A = REPRODUCABILITY STANDARD DEVIATION Y LABID
LET H = H CONSISTENCY STATISTIC Y LABID
LET K = K CONSISTENCY STATISTIC Y LABID
LET H TAG = H CONSISTENCY STATISTIC Y LABID MATID
LET K TAG = K CONSISTENCY STATISTIC Y LABID MATID
E691 INTERLAB Y LABID MATID
The E691 INTERLAB command generates four tables documentented
in the above document. The other comamnds are useful in
generating the plots described in this standard.
In addition, a number of built-in macros were added to
generate the various graphs demonstrated in the standard.
For more information, enter
HELP E691 INTERLAB
3) The following command can be useful in converting data in a
two-way table to a format required by certain Dataplot
commands
LET Y MATID LABID = REPLICATED STACK X1 ... XK LAB
The resulting output has the form
X1(1) 1 LAB(1)
. . .
X1(n) 1 LAB(n)
X2(1) 2 LAB(1)
. . .
X2(n) 2 LAB(n)
...
Xk(1) k LAB(1)
. . .
Xk(n) k LAB(n)
This is a variation of the STACK command. The distinction is
that the last variable entered is interpreted as a labid
variable that is replicated for each of the response variables.
For details, enter
HELP REPLICATED STACK
4) Extreme Value Analysis
a. Enhancements were made to the CME and DEHAAN commands (these
estimate the parameters for a generalized Pareto distribution).
b. Added the following command
PEAKS OVER THRESHOLD PLOT Y
For details, enter PEAKS OVER THRESHOLD PLOT Y.
5) Platform Specific Issues
a) We have separated the Windows installation files into two
distinct cases:
a) Windows 2000/XP platforms
b) Windows 95/98/NT4/ME platforms
This was required for compiler compatibility reasons. The
Lahey LF90 and Compaq Visual Fortran compilers were starting
to show some problems under Windows XP (specifically with
Service Pack 2).
For Windows 2000/XP, we have upgraded to the Intel 8.1
Fortran compiler. However, this compiler does not support
Windows 98 and earlier platforms. So the
Windows 95/98/NT4/ME version is still built using the
Lahey (for the GUI) and Compaq compilers.
b) We have updated the Mac OSX installation. There is now a
single file that you download that includes the executable,
the auxillary files, the source, the needed Tcl/Tk files,
and the g77 compiler. This simplifies the installation
(e.g., you do not have to install Tcl/Tk yourself).
6) We have started overhauling some of the menus for the graphical
interface (GUI). This will not be radically different, just an
effort to provide better organization and clarity to the menus.
This updating will occur over several releases. The initial
update has re-arranged the top level menus. We have added
a "Getting Started" menu to help new users. The Reliability
and Extreme Values menus have been reorganized.
7) Dataplot uses the "." for the decimal point when reading data.
Some countries use the "," for this purpose.
We have added the command
SET DECIMAL POINT
with denoting the character to be used as the decimal
point.
Note that the use of this is currently fairly limited. It is
used in free-format reads only. It is provided to allow
international users the ability to read their data files
without editing them. Note that it does not apply if you
use the SET READ FORMAT command to define a format for the
data. It is also not used for writing data nor for the
output from Dataplot commands.
8) Fixed a number of bugs.
a. Fixed the COLUMN LIMITS where the specified limits are
arrays (as oppossed to single scalar values) to work in
the case where columns are of unequal length.
b. Internally, Dataplot treats strings and functions
interchangeably. The one distinction is that strings
preserve case. However, when strings are operating as
functions, we want them to be converted to upper case.
Dataplot was updated so that when a string is used as a
function, it is converted to upper case. This also
required some updates in the "^" and "&" string operators
to handle case conversions appropriately.
c. Fixed a bug in the Wilcox signed rank test when it was
used for a 1-sample test.
d. For generalized Pareto percent point function, the scale
parameter was ignored. This was corrected.
e. Fixed a bug in the HFLPPF library function.
f. The GRUBBS TEST checks for both the maximum and minimum
values as outliers (relative to the normal distribution).
This is actually two tests: one for the minimum value and
one for the maximum value. When testing for both, the
value of alpha needs to be divided by 2.
The fix was to have the Grubbs test generate output for
3 tests:
1) Test both the minimum and the maximum value (with the
value of alpha adjusted appropriately).
2) Test the minimum value only.
3) Test the maximum value only.
To suppress the one-sided tests, enter the command
SET GRUBBS ONE SIDED OFF
g. Fixed a bug in the discrete uniform random number generator.
The algorithm was generating random numbers on the interval
[1,N]. This was corrected to generate random numbers on the
interval [0,N].
h. If the PRINTING switch was set to OFF, the YATES command
was not writing information to files "dpst1f.dat" and
"dpst2f.dat". This was corrected so that these files are
printed regardless of the setting of the PRINTING switch.
-----------------------------------------------------------------------
The following enhancements were made to DATAPLOT June - December 2004.
-----------------------------------------------------------------------
1) The following updates were made for probability distributions.
A. The following enhancements were made to maximum likelihood
estimation.
1. The maximum likelihood output was rewritten for the
normal, lognormal, exponential, Weibull, gamma, beta,
Gumbel, and Pareto distributions.
Support was added for the following:
a. Improved confidence intervals for the distributional
parameters.
b. support for censored data was added for the normal,
lognormal, exponential, Weibull, and gamma distributions.
c. Confidence intervals for selected percentiles was added
for the normal, lognormal, exponential, Weibull, gamma,
beta, and Gumbel distributions.
2. Added support for the Rayleigh, Maxwell, asymmetric
Laplace, generalized Pareto, and normal mixture
distributions:
RAYLEIGH MAXIMUM LIKELIHOOD Y
MAXWELL MAXIMUM LIKELIHOOD Y
ASYMMETRIC LAPLACE MAXIMUM LIKELIHOOD Y
GENERALIZED PARETO MAXIMUM LIKELIHOOD Y
LET NCOMP =
NORMAL MIXTURE MAXIMUM LIKELIHOOD Y
The NCOMP parameter is used to specify how many normal
distributions to mix (it defaults to 2 if a value is not
specified for NCOMP).
The online help for the maximum likelihood was also rewritten.
Enter
HELP MAXIMUM LIKELIHOOD
for details.
B. Support was added for the following new distributions.
Skew-Laplace (Skew Double Exponential) distribution:
LET A = SDECDF(X,LAMBDA) - cdf of skew-Laplace distribution
LET A = SDEPDF(X,LAMBDA) - pdf of skew-Laplace distribution
LET A = SDEPPF(X,LAMBDA) - ppf of skew-Laplace distribution
Asymmetric Laplace (Asymmetric Double Exponential) distribution:
LET A = ADECDF(X,LAMBDA) - cdf of asymmetric Laplace
distribution
LET A = ADEPDF(X,LAMBDA) - pdf of aysmmetric Laplace
distribution
LET A = ADEPPF(X,LAMBDA) - ppf of asymmetric Laplace
distribution
Maxwell-Boltzman distribution:
LET A = MAXCDF(X,SIGMA) - cdf of Maxwell Boltzman
LET A = MAXPDF(X,SIGMA) - pdf of Maxwell Boltzman
LET A = MAXPPF(X,SIGMA) - ppf of Maxwell Boltzman
Rayleigh distribution:
LET A = RAYCDF(X) - cdf of Maxwell Boltzman
LET A = RAYPDF(X) - pdf of Maxwell Boltzman
LET A = RAYPPF(X) - ppf of Maxwell Boltzman
Generalized Inverse Gaussian distribution:
LET A = GIGCDF(X,CHI,LAMBDA,THETA) - cdf of generalized inverse
gaussian distribution
LET A = GIGPDF(X,CHI,LAMBDA,THETA) - pdf of generalized inverse
gaussian distribution
LET A = GIGPPF(X,CHI,LAMBDA,THETA) - ppf of generalized inverse
gaussian distribution
Generalized Asymmetric Laplace distribution:
LET A = GALCDF(X,KAPPA,TAU) - cdf of generalized asymmetric
Laplace distribution
LET A = GALPDF(X,KAPPA,TAU) - pdf of generalized asymmetric
Laplace distribution
LET A = GALPPF(X,KAPPA,TAU) - ppf of generalized asymmetric
Laplace distribution
Bessel I Function distribution:
LET A = BEICDF(X,S1SQ,S2SQ,NU) - cdf of Bessel I function
distribution
LET A = BEIPDF(X,S1SQ,S2SQ,NU) - pdf of Bessel I function
distribution
LET A = BEIPPF(X,S1SQ,S2SQ,NU) - ppf of Bessel I function
distribution
McLeish (related to Bessel K function) distribution:
LET A = MCLCDF(X,ALPHA) - cdf of McLeish distribution
LET A = MCLPDF(X,ALPHA) - pdf of McLeish distribution
LET A = MCLPPF(X,ALPHA) - ppf of McLeish distribution
Generalized McLeish (related to Bessel K function) distribution:
LET A = GMCCDF(X,ALPHA,A) - cdf of McLeish distribution
LET A = GMCPDF(X,ALPHA,A) - pdf of McLeish distribution
LET A = GMCPPF(X,ALPHA,A) - ppf of McLeish distribution
C. The following random number generators, plots, and commands
were added:
LET LAMBDA =
LET Y = SKEW LAPLACE RANDOM NUMBERS FOR I = 1 1 N
SKEW LAPLACE PROBABILITY PLOT Y
SKEW LAPLACE KOLMOGOROV SMIRNOV GOODNESS OF FIT Y
SKEW LAPLACE CHI-SQUARE GOODNESS OF FIT Y
SKEW LAPLACE PPCC PLOT Y
SKEW LAPLACE KS PLOT Y
LET LAMBDA =
LET Y = ASYMMETRIC LAPLACE RANDOM NUMBERS FOR I = 1 1 N
ASYMMETRIC LAPLACE PROBABILITY PLOT Y
ASYMMETRIC LAPLACE KOLMOGOROV SMIRNOV GOODNESS OF FIT Y
ASYMMETRIC LAPLACE CHI-SQUARE GOODNESS OF FIT Y
ASYMMETRIC LAPLACE PPCC PLOT Y
ASYMMETRIC LAPLACE KS PLOT Y
LET Y = MAXWELL RANDOM NUMBERS FOR I = 1 1 N
MAXWELL PROBABILITY PLOT Y
MAXWELL KOLMOGOROV SMIRNOV GOODNESS OF FIT Y
MAXWELL CHI-SQUARE GOODNESS OF FIT Y
LET Y = RAYLEIGH RANDOM NUMBERS FOR I = 1 1 N
RAYLEIGH PROBABILITY PLOT Y
RAYLEIGH KOLMOGOROV SMIRNOV GOODNESS OF FIT Y
RAYLEIGH CHI-SQUARE GOODNESS OF FIT Y
LET CHI =
LET LAMBDA =
LET THETA =
LET Y = GENERALIZED INVERSE GAUSSIAN RANDOM NUMBERS ...
FOR I = 1 1 N
GENERALIZED INVERSE GAUSSIAN PROBABILITY PLOT Y
GENERALIZED INVERSE GAUSSIAN KOLMOGOROV SMIRNOV ...
GOODNESS OF FIT Y
GENERALIZED INVERSE GAUSSIAN CHI-SQUARE ...
GOODNESS OF FIT Y
LET KAPPA =
LET TAU =
LET Y = GENERALIZED ASYMMETRIC LAPLACE RANDOM NUMBERS ...
FOR I = 1 1 N
GENERALIZED ASYMMETRIC LAPLACE PROBABILITY PLOT Y
GENERALIZED ASYMMETRIC LAPLACE KOLMOGOROV SMIRNOV ...
GOODNESS OF FIT Y
GENERALIZED ASYMMETRIC LAPLACE CHI-SQUARE ...
GOODNESS OF FIT Y
LET S1SQ =
LET S2SQ =
LET NU =
LET Y = BESSEL I FUNCTION RANDOM NUMBERS FOR I = 1 1 N
BESSEL I FUNCTION PROBABILITY PLOT Y
BESSEL I FUNCTION KOLMOGOROV SMIRNOV GOODNESS OF FIT Y
BESSEL I FUNCTION CHI-SQUARE GOODNESS OF FIT Y
LET ALPHA =
LET Y = MCLEISH RANDOM NUMBERS FOR I = 1 1 N
MCLEISH PROBABILITY PLOT Y
MCLEISH KOLMOGOROV SMIRNOV GOODNESS OF FIT Y
MCLEISH CHI-SQUARE GOODNESS OF FIT Y
MCLEISH PPCC PLOT Y
MCLEISH KS PLOT Y
LET ALPHA =
LET A =
LET Y = GENERALIZED MCLEISH RANDOM NUMBERS FOR I = 1 1 N
GENERALIZED MCLEISH PROBABILITY PLOT Y
GENERALIZED MCLEISH KOLMOGOROV SMIRNOV GOODNESS OF FIT Y
GENERALIZED MCLEISH CHI-SQUARE GOODNESS OF FIT Y
GENERALIZED MCLEISH PPCC PLOT Y
GENERALIZED MCLEISH KS PLOT Y
D. Dataplot uses the following defintion for the generalized
Pareto probability density function:
f(x,gamma) = (1+gamma*x)**(-(1/gamma)-1)
However, many sources (e.g., Johnson, Kotz, and Balakrishnan)
define the generalized Pareto as:
f(x,gamma) = (1-gamma*x)**((1/gamma)-1)
That is, the sign of gamma is reversed. The following
command was added:
SET GENERALIZED PARETO DEFINITION
was added. A value of JOHNSON or KOTZ for this command
will use the second definition given. Any other value
will use the first (default) definition.
E. For the Pareto and Pareto type 2 distributions, what is
typically referred to as the location parameter (the A
parameter) is not a location parameter in the technical
sense that the relation
f(x;gamma,loc) = f((x-loc);gamma,0)
does not hold (it is a location parameter in the sense
that it defines a lower bound for the Pareto, but not the
Pareto type 2, distribution).
For this reason, we modified the Dataplot definition to
treat A as a second shape parameter. For example, the
Pareto PDF function is
PARPDF(x,gamma,a,loc,scale)
The A, LOC, and SCALE parameters are optional (A will
default to 1 if not given).
F. The following enhancements were made to the probability
plot and ppcc/ks plots.
Note that both the probability plot and the ppcc plot
ultimately depend on computing the percent point function
for the specified distribution. If the percent point function
is fast to compute (e.g., if it exists as a simple, closed
formula), then these plots can be generated rapidly even if the
number of data points is large. On the other hand, some percent
point functions can require a good deal of computation. For
example, some distributions compute the cumulative distribution
function via numerical integration and then compute the percent
point function by inverting the cumulative distribution
function. In these cases, the ppcc/ks plots can take too long
to generate to be practical (this tends to be less of an issue
with probability plots).
1. The following commands can be used to control how many
points are used to generate probability and ppcc/ks
plots, respectively:
SET PROBABILITY PLOT DATA POINTS
SET PPCC PLOT DATA POINTS
The algorithm is to compute equally spaced
percentiles of the full data set and then use these
percentiles in generating the probability and
ppcc/ks plot.
Using this command involves a trade-off between speed
and accuracy. For distributions with simple, closed
formualas or fast approximations for the percent point
function, there is little reason not to use the full data
set. However, for many distributions, the ppcc plot or
ks plot can become impractical as the number of data points
increases.
The minimum number of points is 20. The number of
points is typically set between 50 and 100. You may
want to use less than 50 points for a few distributions
with particularly expensive percent point functions.
For distributions with only moderately expensive percent
point functions, you may want to go as high as 100 or
200.
2. For the ppcc (or ks) plot, each point on the plot
represents one underlying probability plot (which in
return requires n, where n is the sample size, computations
of the percent point function. For distributions with
one shape parameter, Dataplot typically uses 50 points
(i.e., there are 50 underlying probability plots
computed). For two shape parameters, Dataplot typically
uses between 20 and 50 values for each shape parameter.
It decreases the number of values used when the percent
point function is expensive to compute.
The following command allows you to explicitly specify
how many probability plots are generated by the ppcc plot:
SET PPCC PLOT AXIS POINTS
with and denoting the number of values
to use for the first and second shape parameters,
respectively. Specifying is optional.
Set these values to 0 in order to revert to the Dataplot
default.
There are actually two reasons for using this command.
If the percent point function is fast to compute (e.g.,
the Weibull distribution), you may want to increase the
number of points in order to generate a finer grid. On
the other hand, if the percent point function is
expensive to compute, you may want to decrease the
number of points to speed up the generation of the plot.
3. If the ppcc (or ks) plot has two shape parameters, then
the default graphical format is to plot the ppcc (or
ks) value on the y-axis. Each curve on the plot
represents one value of one shape parameter while the
value of the x-axis coordinate represents the value of
the other shape parameter. To reverse the roles of the
shape parameters, enter the command
SET PPCC PLOT AXIS ORDER REVERSE
To restore the default, enter
SET PPCC PLOT AXIS ORDER DEFAULT
4. The PPCC PLOT will write the following to the file
dpst2f.dat (in the current directory):
PPCC LOCATION SCALE SHAPE1 SHAPE2
VALUE PARAMETER PARAMETER PARAMETER PARAMETER
This can be useful for plotting how the estimate of location
and scale change as the shape parameter changes. In some
cases, a less optimal value of the shape parameters may
be preferred if it generates more realistic estimates for
location and scale.
5. The PROBABILITY PLOT and PPCC PLOT were updated to support
multiply censored data.
The syntax is
CENSORED PROBABILITY PLOT Y X
CENSORED PPCC PLOT Y X
The X variable identifies which points represent failure
and which represent censoring times. Specifically,
X = 1 implies a failure time and X = 0 represents a
censoring time. The word CENSORED is required to
distinguish this syntax from the syntax for binned
data. Censored probability plots and censored ppcc
plots do not apply to binned data.
Dataplot supports two algorithms for determining plot
coordinates for a censored probability plot.
i. The uniform order statistic medians are generated
based on the full sample size. However, only
values that represent a failure time are actually
plotted.
ii. Instead of uniform order statistic medians, the
plotting positions for the failure times are
computed using the Kaplan-Meier product limit
estimate:
U(i) = ((n+0.7)/(n+0.4))*
PRODUCT[q=1 to i][(n-q+0.7)/(n-q+1.7)]
with n denoting the full sample size and q denoting
failure times only. The theoretical quantile is then
the percent point function of U(i).
The censored ppcc plot is then based on the correlation
coefficient of the censored probability plot.
To specify which censoring algorithm to use, enter the
commands
SET CENSORED PROBABILITY PLOT
SET CENSORED PPCC PLOT
The default is to use the uniform order statistic medians
algorithm.
G. The following enhancements were made to the
Kolmogorov-Smirnov goodness of fit command and the KS PLOT.
plot and ppcc/ks plots.
1. The KS PLOT for the binned case ( KS PLOT Y X) now
automatically plots the chi-square goodness of fit
statistic rather than the Kolmogorov-Smirnov goodness of
fit statistic. This is done since the chi-square goodness
of fit is expliticly based on binned data. Note that
bins with a size less than 5 are automatically combined
so that the minimum bin size is at least 5.
2. The KS PLOT will write the following to the file
dpst2f.dat (in the current directory):
PPCC LOCATION SCALE SHAPE1 SHAPE2
VALUE PARAMETER PARAMETER PARAMETER PARAMETER
This can be useful for plotting how the estimate of location
and scale change as the shape parameter changes. In some
cases, a less optimal value of the shape parameters may
be preferred if it generates more realistic estimates for
location and scale.
2) The following graphics commands were added.
a. Univariate average shifted histograms can be generated with
the command:
ASH HISTOGRAM Y
3) The following analysis commands were added.
a. Cochran's test can be performed with the command
COCHRAN TEST Y X
where Y is a response variable and X is a group identifier
variable. Cochran's test is an alternative to the
Kruskal-Wallis test when the response variable is dichotomous
(i.e., only 2 possible values).
b. The Kruskal-Wallis test was enhanced to write the pairwise
multiple comparisons to the file dpst1f.dat.
c. Van Der Waerden's test can be performed with the command
VAN DER WAERDEN TEST Y X
where Y is a response variable and X is a group identifier
variable. Van Der Waerden's test is an alternative to
KRUSKAL WALLIS that is based on normal scores of the ranks.
4) The following statistics and LET subcommands were added.
a. Kendell's tau can be computed with the command
LET A = KENDELL TAU Y1 Y2
b. For the chi-square goodness of fit, it is generally advisable
to combine bins with small counts (typically, 5 is recommended
as a minimum bin size). To convert equal width bins to
variable width bins with a minimum bin count, enter the
commands
LET MINSIZE =
LET Y2 XLOW XUPPER = Y X
c. The commands
LET Y2 X2 = ASH BINNED Y
LET Y2 X2 = COUNTS ASH BINNED Y
generate frequency tables based on the average shifted
histogram (see ASH HISTOGRAM above). The first syntax returns
the relative frequency while the second syntax returns a
count.
5) The following enhancements were made to the READ command.
a. In previous versions of Dataplot, if your data set contained
rows with an unequal number of columns, Dataplot would only
read the number of variables corresponding to the row
with the minimum number of columns.
If you would like Dataplot to pad missing columns with a
missing value, enter the command
SET READ PAD MISSING COLUMNS ON
For example, if you enter the command
READ FILE.DAT X1 X2 X3 X4 X5
then rows with less than five columns will set the missing
rows to a missing value. To set the numeric value that
represents a missing value, enter
SET READ MISSING VALUE
where denotes the desired numeric value.
To reset the default behavior, enter the command
SET READ PAD MISSING COLUMNS OFF
In some cases, missing columns would be indicative of an
error in the data file.
b. The SUBSET/EXCEPT/FOR clause on a READ command was ambiguous.
The ambiguity aries from the fact that it is not clear whether
the SUBSET/EXCEPT/CLAUSE command refers to the lines in the
data file being read or to the output variables that are
created by the READ command. We address this with the
following command:
SET READ SUBSET
In this command, PACK means the SUBSET/EXCEPT/FOR clause
does not apply while DISPERSE means that it does. The
first setting applies to the input file while the second
setting applies to the created data variables.
This is demonstrated with the following example (note that
P-D means the data file is set to PACK and the output
variable is set to DISPERSE). The first column is the
data in the file while the remaining columns show what
the resulting data variable should look like.
READ FILE.DAT X FOR I = 1 2 10
X P-D P-P D-P D-D
===========================================
1 1 1 1 1
2 0 2 3 0
3 2 3 5 3
4 0 4 7 0
5 3 5 9 5
6 0 6 - 0
7 4 7 - 7
8 0 8 - 0
9 5 9 - 9
10 - 10 - -
The default setting is PACK-DISPERSE (this is the default
because this is the behavior of previous versions of Dataplot).
6) Miscellaneous Updates
a. Added the command
SET POSTSCRIPT DEFAULT COLOR
Postscript devices can be either black and white or color.
Dataplot assumes black and white by default. After the
DEVICE <2/3> POSTSCRIPT command, you can enter
DEVICE <2/3> COLOR ON
Although this works fine for DEVICE 2, it presents
complications for DEVICE 3 (this is the device used by the
PP command to print the current graph to a Postscript
printer). Dataplot opens/closes this device as needed
without the user entering any commands. It can be
difficult to determine when to insert a DEVICE 3 COLOR ON
command.
If you enter
SET POSTSCRIPT DEFAULT COLOR ON
then Dataplot will assume Postscript devices are color
(this applies to both DEVICE 2 and DEVICE 3, although it
is primarily motivated for DEVICE 3 output).
b. The default algorithm for class width in Dataplot is to
use 0.3*s where s is the sample standard deviation.
A number of different algorithms have been proposed to
obtain "optimal" class widths. The command
SET HISTOGRAM CLASS WIDTH
can be used to specify the default class width that Dataplot
will use for the HISTOGRAM and ASH HISTOGRAM commands.
Additional choices may be added in future releases.
The current choices are:
DEFAULT - use 0.3*s
SD - use 0.3*s
NORMAL - use 2.5*s/n**(1/3)
NORMAL CORRECTED - start with 2.5*s/n**(1/3). If the
skewness is between 0 and 3, multiply
this by the correction factor:
1/(1 - 0.006*skew + 0.27*skew**2 -
0.0069*skew**3).
If the kurtosis - 3 is between 0 and 6,
multiply by the correction factor:
1 - 0.2*(1 - EXP(-0.7*(kurt - 3)))
IQ - use 2.603*IQ/N**(1/3) where IQ is the
interquartile range
The NORMAL width is an optimal choice (in the sense of
minimizing the integrated mean square error of the histogram)
if the data is in fact normal. The NORMAL CORRECTED provides
correction factors for moderate skewness and kurtosis. The
IQ replaces s with a robust estimate of scale (the
interquartile range) and should provide a reasonable bin width
for a wide range of underlying distributions.
Since the "optimal" choice of bin width is dependent on
the underlying distribution of the data, it is difficult
to provide a default bin width that will work well in all
cases (we are typically using the histogram to help determine
what that underlying distribution actually is).
An explicit CLASS WIDTH command will override the default
class width algorithm.
c. For the chi-square goodness of fit test, it is usually
recommended that classes with less than 5 observations be
combined in order to obtain a reasonably accurate
approximation. Given data that is binned into equal size
bins, you can automatically combine bins with small
frequencies with the commands
LET MINSIZE =
LET Y3 XLOW XHIGH = COMBINE FREQUENCY TABLE Y2 X2
The variables XLOW and XHIGH will contain the lower and upper
boundary values for the classes (since bins will no longer be
of equal length), respectively. The value for MINSIZE defines
the minimum frequency for a class (it defaults to 5).
You can then generate a chi-square goodness of fit test
with the command
CHISQUARE GOODNESS OF FIT Y3 XLOW XHIGH
A typical sequence of commands for generating a chi-square
goodness of fit test for a discrete distribution, starting
from raw data, is
LET AMIN = MINIMUM Y
LET AMAX = MAXIMUM Y
CLASS LOWER AMIN
CLASS UPPER AMAX
CLASS WIDTH 1
LET Y2 X2 = BINNED Y
LET MINSIZE = 5
LET Y3 XLOW XHIGH = COMBINE FREQUENCY TABLE Y2 X2
CHISQUARE GOODNESS OF FIT Y3 XLOW XHIGH
d. The CORRELATION MATRIX and COVARIANCE MATRIX compute the
correlation and covariance matrices, respectively, of the
columns of a matrix. If you would like these to be
generated from the rows of the matrix, you can enter the
commands
SET CORRELATION MATRIX DIRECTION ROW
SET COVARIANCE MATRIX DIRECTION ROW
To reset to the columns, enter
SET CORRELATION MATRIX DIRECTION COLUMN
SET COVARIANCE MATRIX DIRECTION COLUMN
7) Bug Fixes:
a. There was a bug reading numbers of the form
-.23
In this case, the minus sign was being lost. You can
work around this by entering the number as
-0.23
This bug is fixed in the current version.
NOTE: This bug was introduced in the 1/2004 version.
b. There was a bug reading rows containing a single character.
This has been fixed. If you encounter this bug, you can
work around it by inserting a leading space in the data
file.
NOTE: This bug was introduced in the 1/2004 version.
c. The SET commands that accepted file names as arguments did
not support quoting. Enclosing the file name in quotes is
required when the file names contains spaces or hyphens.
This has been corrected.
d. There was a bug in the SUMMARY command where in some cases
it did not extract the correct data. This has been fixed.
e. There was a bug in the KAPLAN MEIER PLOT command that caused
the censoring variable to not be recognized. This has been
corrected.
-------------------------------------------------------------------------
The following enhancements were made to DATAPLOT February - May 2004.
-------------------------------------------------------------------------
1) The following updates were made for probability distributions.
a. Support was added for the following new distributions.
Log-skew-normal distribution:
LET A = LSNCDF(X,LAMBDA,SD) - cdf of log-skew-normal
distribution
LET A = LSNPDF(X,LAMBDA,SD) - pdf of log-skew-normal
distribution
LET A = LSNPPF(P,LAMBDA,SD) - ppf of log-skew-normal
distribution
Log-skew-t distribution:
LET A = LSTCDF(X,NU,LAMBDA,SD) - cdf of log-skew-normal
distribution
LET A = LSTPDF(X,NU,LAMBDA,SD) - pdf of log-skew-normal
distribution
LET A = LSTPPF(P,NU,LAMBDA,SD) - ppf of log-skew-normal
distribution
G-and-H distribution:
LET A = GHCDF(X,G,H) - cdf of g-and-h distribution
LET A = GHPDF(X,G,H) - pdf of g-and-h distribution
Note that the ppf function was added in a previous update.
Hermite distribution:
LET A = HERCDF(X,A,B) - cdf of Hermite distribution
LET A = HERPDF(X,A,B) - pdf of Hermite distribution
LET A = HERPPF(P,A,B) - ppf of Hermite distribution
Yule distribution:
LET A = YULCDF(X,P) - cdf of Yule distribution
LET A = YULPDF(X,P) - pdf of Yule distribution
LET A = YULPPF(P,P) - ppf of Yule distribution
b. The following pdf functions were added (these distributions
previously supported the cdf and ppf functions).
LET A = NCTPDF(X,NU,LAMBDA) - pdf of non-central t
LET A = DNTPDF(X,NU,L1,L2) - pdf of doubly non-central t
LET A = NCCPDF(X,NU,LAMBDA) - pdf of non-central chi-square
LET A = NCFPDF(X,NU1,NU2,L1) - pdf of non-central F
LET A = DNFPDF(X,NU1,NU2,L1,L2) - pdf of doubly non-central F
LET A = NCBPDF(X,A,B,LAMBDA) - pdf of non-central Beta
These pdf functions are computed by taking the numerical
derivative of the corresponding cdf function. You may
at times get warning messages that the derivative has not
converged with sufficient accuracy (this occurs most frequently
with the non-central Beta distribution).
c. The following enhancements were made to maximum likelihood
estimation.
1. The binomial case now generates lower and upper confidence
limits based on the Agresti and Coull approximation.
2. The lognormal case now generates confidence limits for
the shape and scale parameters.
3. Support was added for the following distributions:
LOGARITHIC SERIES MAXIMUM LIKELIHOOD Y
GEOMETRIC MAXIMUM LIKELIHOOD Y
BETA BINOMIAL MAXIMUM LIKELIHOOD Y
NEGATIVE BINOMIAL MAXIMUM LIKELIHOOD Y
HYPERGEOMETRIC MAXIMUM LIKELIHOOD Y
HERMITE MAXIMUM LIKELIHOOD Y
YULE MAXIMUM LIKELIHOOD Y
FATIGUE LIFE MAXIMUM LIKELIHOOD Y
GEOMETRIC EXTREME EXPONENTIAL MAXIMUM LIKELIHOOD Y
FOLDED NORMAL MAXIMUM LIKELIHOOD Y
CAUCHY MAXIMUM LIKELIHOOD Y
4. For the Johnson SU/SB distribution, a percentile
estimator is now available (a method of moments
estimator was previously available):
JOHNSON PERCENTILE Y
Note that this estimator will automatically determine
whether a SB or SU estimator is appropiate. Also, you
can define a constant Z used by this estimator by
entering the command (before the JOHNSON PERCENTILE
command):
LET Z =
This value is typically set between 0.5 and 1 with a
default value of 0.54. As the sample size gets larger,
then values of Z closer to 1 are appropriate (e.g.,
for a sample of size 1,000, a value of 0.8 works well).
5. Support for Latex and HTML output was added to most
supported distributions.
d. The following random number generators were added:
LET NU =
LET LAMBDA =
LET Y = NONCENTRAL T RANDOM NUMBERS FOR I = 1 1 N
LET NU =
LET LAMBDA1 =
LET LAMBDA2 =
LET Y = DOUBLY NONCENTRAL T RANDOM NUMBERS FOR I = 1 1 N
LET NU =
LET LAMBDA =
LET Y = NONCENTRAL BETA RANDOM NUMBERS FOR I = 1 1 N
LET GAMMA =
LET Y = GENERALIZED LOGISTIC RANDOM NUMBERS FOR I = 1 1 N
LET GAMMA =
LET Y = GENERALIZED HALF-LOGISTIC RANDOM NUMBERS FOR I = 1 1 N
LET ALPHA =
LET BETA =
LET Y = HERMITE RANDOM NUMBERS FOR I = 1 1 N
LET P =
LET Y = YULE RANDOM NUMBERS FOR I = 1 1 N
LET A =
LET C =
LET Y = WARING RANDOM NUMBERS FOR I = 1 1 N
LET A =
LET B =
LET C =
LET Y = GENERALIZED WARING RANDOM NUMBERS FOR I = 1 1 N
The t, F, and chi-square random number generators were
updated to accept non-integer values for the degrees of
freedom parameters.
e. The following additions were made to the probability plot,
Kolmogorov-Smirnov goodness of fit, chi-sqaure goodness of
fit, and ppcc plot commands:
LET LAMBDA =
LET SD =
LOG SKEW NORMAL PROBABILITY PLOT Y
LOG SKEW NORMAL KOLMOGOROV-SMIRNOV GOODNESS OF FIT Y
LOG SKEW NORMAL CHI-SQUARE GOODNESS OF FIT Y
LOG SKEW NORMAL PPCC PLOT Y
LET LAMBDA =
LET SD =
LET NU =
LOG SKEW T PROBABILITY PLOT Y
LOG SKEW T KOLMOGOROV-SMIRNOV GOODNESS OF FIT Y
LOG SKEW T CHI-SQUARE GOODNESS OF FIT Y
LET G =
LET H =
G AND H PROBABILITY PLOT Y
G AND H KOLMOGOROV-SMIRNOV GOODNESS OF FIT Y
G AND H CHI-SQUARE GOODNESS OF FIT Y
G AND H PPCC PLOT Y
LET ALPHA =
LET BETA =
HERMITE PROBABILITY PLOT Y
HERMITE CHI-SQUARE GOODNESS OF FIT Y
HERMITE PPCC PLOT Y
LET P =
YULE PROBABILITY PLOT Y
YULE CHI-SQUARE GOODNESS OF FIT Y
YULE PPCC PLOT Y
f. The Anderson Darling test was updated to support the
generalized Pareto distribution:
ANDERSON-DARLING GENERALIZED PARETO TEST Y
The maximum likelihood estimation for the generalized
Pareto is still undergoing algorithmic development, so
you should specify the shape and scale parameter for
the generalized Pareto (before invoking the Anderson-Darling
test) as follows:
LET GAMMA =
LET A =
g. An optional definition was added for the geometric
distribution.
The default defintion for the geometric distribution is the
number of failures before the first success is obtained in
a sequence of Bernoulli trials. The alternate definition
is the number of trials up to and including the first
success in a series of Bernoulli trials. This definition
simply shifts the geometric distribution to start at X = 1
rather than X = 0.
To specify the alternate definition, enter the command
SET GEOMETRIC DEFINITION DLMF
To restore the default definition, enter the command
SET GEOMETRIC DEFINITION JOHNSON AND KOTZ
h. The negative binomial was updated to support non-integer
arguments for the number of failures shape parameter
(i.e., k).
i. A number of bug fixes and algorithmic improvements were made
for the ppcc plots with two shape parameters and the random
number generation for a few distributions.
2. The following enhancements were made to the PPCC PLOT and
PROBABILITY PLOT commands.
a. For some long tailed distributions, there can be large
variability in the tails. This can distort the estimates
of location, PPA0, and scale, PPA1, of the line fitted
to the probability plot. To address this, Dataplot now
also returns PPA0BW and PPA1BW. These are the estimates
obtained by performing two iterations of biweight
weighting of the residuals.
In most cases, the use of PPA0 and PPA1 is preferred.
However, if the probability plot indicates the prescence
of extreme outliers in the tails, PPA0BW and PPA1BW may
provide better estimates for the location and scale
parameters.
b. The following command was added as a variant of the
ppcc plot:
KS PLOT Y
where is any of the distributions supported by
the PPCC PLOT command.
This plot uses a similar concept to the ppcc plot.
However, it uses the value of the Kolmogorov-Smirnov
goodness of fit statistic rather than the correlation
coefficient of the probability plot as the measure
of distributional fit. In this, the goal is to minimize
the Kolmogorov-Smirnov goodness of fit statistic.
Although we are still developing experience with this
plot, a few prelimary recommendations are:
1. For most continuous distributions with one shape
parameter, the PPCC PLOT and KS PLOT generate similar
estimates for the shape parameter.
2. The KS PLOT seems to perform better for at least some
distributions with two shape parameters.
3. The KS PLOT generates a smoother plot for discrete
distributions.
For additional information, enter
HELP KS PLOT
c. For the PPCC PLOT and KS PLOT, the following command
allows you to specify the desired format for the
plot when there are two shape parameters:
SET PPCC FORMAT
For the default setting, TRACE, these plots are generated
as a multi-trace 2D plot. That is, the Y axis will
represent the correlation (or value of the
Kolmogorov-Smirnov statistic), the X axis will represent
the value of the second shape parameter, and each trace
will represent one of the values for the first shape
parameter.
If this value is set to 3D, the plot is represented as
a 3D surface plot.
3. Sometimes data may only be available in the form of a frequency
table. However, some Dataplot commands may expect the data
in a "raw" format. The following command was added to convert
frequency data to raw data:
LET Y = FREQUENCY TO RAW X FREQ
For example,
X FREQ
--------
0 3
1 2
2 4
would be converted to
0
0
0
1
1
2
2
2
2
-------------------------------------------------------------------------
The following enhancements were made to DATAPLOT June 2003-January 2004.
-------------------------------------------------------------------------
1) The following enhancements were made to the Dataplot I/O
capabilities.
a) Previously, the Dataplot READ command was updated to
handle the syntax
READ FILE.DAT
In this case, Dataplot simply assigns the names X1, X2,
and so on to the variables. Many packages accept data
files where the first line contains the variable names.
To support this in Dataplot, do the following:
SET READ VARIABLE LABEL ON
READ FILE.DAT
In this case, Dataplot will interpret the first line
read as the variable names in the file.
b) Dataplot has previously not supported reading character
variables in data files (with the one execption of READ ROW
LABELS). If encountered, Dataplot would generate an error
message and not read the data file correctly. To address
this, we have added the command
SET CONVERT CHARACTER
Setting this to ERROR will continue the current Dataplot
action of reporting an error. This is recommended for the
case when a file is suppossed to contain only numeric data
and the presence of character data is in fact indicative
of an error in the data file. Setting this to IGNORE will
instruct Dataplot to simply ignore any fields containing
character data. Setting this to ON will read character fields
and write them to the file "dpzchf.dat".
There are some restrictions on when Dataplot will try to
read character data:
1) This only applies to the variable read case. That
is, READ PARAMETER and READ MATRIX will ignore
character fields or treat them as an error.
2) Dataplot will only try to read character data from
a file. When reading from the keyboard (i.e., when
READ is specified with no file name), character data
will be ignored when a SET CONVERT CHARACTER ON is
specified.
3) This capability is not supported for the SERIAL READ
case.
4) The SET READ FORMAT command does not accept the
"A" format specification for reading character
fields.
Some of these restrictions may be addressed in subsequent
releases of Dataplot.
Enter HELP CONVERT CHARACTER for details.
c) The COLUMN LIMITS command has been updated to accept
variable arguments. For example,
COLUMN LIMITS LOWER UPPER
with LOWER and UPPER denoting variables (as oppossed to
parameters) each with N elements. Dataplot will parse
the data file assuming that field one of the data is in
columns LOWER(1) to UPPER(1), field two of the data is
in LOWER(2) to UPPER(2) and so on. Note that only one
numeric or character variable will be read in each field.
Many programs, Excel for example, will write data to ASCII
files with the data values either left or right justified
to a given column. If the ASCII file is written so that
the decimal point is in a fixed column, then using the
SET READ FORMAT is typically recommended rather than
the COLUMN LIMITS with variable arguments.
If the data file contains columns of equal length, then
using this form of the COLIMNM LIMITS command is not
necessary. However, there are two cases where it is useful:
1) If you only want to read selected fields in the data
file, then this form of the COLUMN LIMITS command
easily allows you to do this.
2) If the data columns are of unequal length, as ASCII
files created from Excel often are, then this form
of the COLUMN LIMITS allows these data files to be
read correctly. If a given field is empty, Dataplot
interprets it as a missing value.
By default, Dataplot will set the missing value to 0.
If you would like to specify a value other than zero,
then enter the command
SET READ MISSING VALUE
where is the desired value.
Enter HELP COLUMN LIMITS for details.
d) If Excel writes a comma delimited ASCII file (.CSV), then
missing values are denoted with ",,". In order to interpert
these files correctly, you can enter the command
SET READ DELIMITER
where specifies the desired delimiter. The default
delimiter is a comma.
If Dataplot encounters the delimiter before any valid data
has been found, it interprets this as a missing value.
Missing values are set to 0 unless a SET READ MISSING VALUE
command has been entered (see above).
We have added a section in the online help files that provides
general guidance on reading ASCII data files in Dataplot.
This consolidates information documented under a number of
different commands. For details, enter
HELP ASCII FILES
2) The SET CONVERT CHARACTER ON command allows you to read
character variables. We have added the following commands
that operate on these character variables.
a) Many character variables are in fact group-id variables.
In order to allow you to use these group-id variables
in a numeric context, the following two commands were added:
LET Y = CHARACTER CODE IX
LET Y = ALPHABETIC CHARACTER CODE IX
with IX denoting the name of a character variable that
has been read into Dataplot and Y denoting the name of a
numeric variable that will be created by this command.
Both of these commands identify the unique rows in the
character variable (Dataplot checks for exact matches, it
does not try to guess if a typo has occurred, etc.). If
there are K unique rows, Dataplot will generate coded values
as the integer values from 1 to K. The distinction is that
CHARACTER CODE will perform the coding in the order that the
unique rows are encoutered in the file while ALPHABETIC
CHARACTER CODE will sort the unique character rows and
code based on the alphabetic order.
b) Character variables are frequently used as group-id
variables (e.g., Male and Female to identify sex). The
following command creates a group-id variable from a
character variable:
LET IG = GROUP LABELS MONTH
with MONTH denoting the name of a character variable.
The name IG will be used to denote a group-id variable.
The number of rows in IG will be equal to the number of
unique rows in MONTH. Up to 5 group-id variables can be
created and the maximum number of rows for a group-id
variable is the maximum number of rows for a numeric
variable divided by 100.
c) You can create a row label variable with the READ ROW LABEL
command. Alternatively, you now enter the command
LET ROWLABEL = MONTH
with MONTH denoting the name of a character variable.
Note that the variable name on the left hand side of the
"=" must be ROWLABEL for this command to work.
d) The TIC MARK LABEL FORMAT and TIC LABEL CONTENT commands
have been updated to suppor the following:
TIC MARK LABEL FORMAT GROUP LABEL
TIC MARK LABEL CONTENT IG
TIC MARK LABEL FORMAT ROW LABEL
TIC MARK LABEL FORMAT VARIABLE
TIC MARK LABEL CONTENT YVAR
Setting the tic mark label format to GROUP LABEL instructs
Dataplot to use a group label variable for the contents
of the tic mark label. The TIC MARK LABEL CONTENT command
is then used to specify the name of the group label variable
to use.
Setting the tic mark label format to VARIABLE is similar to
the GROUP LABEL case. However, in this case a numeric
variable is specified rather than a group label variable.
This allows you to place your own numeric tic mark labels.
For example, you can use this to generate a "reverse" axis.
Setting the tic mark label format to ROW LABEL allows you
to use the row labels as the content for the tic mark labels.
For example, this can be useful for labeling a bar chart.
3) Support for the following univariate distributions was added:
LET A = TRACDF(X,A,B,C,D) - cdf of trapezoid distribution
LET A = TRAPDF(X,A,B,C,D) - pdf of trapezoid distribution
LET A = TRAPPF(P,A,B,C,D) - ppf of trapezoid distribution
LET A = GTRCDF(X,A,B,C,D,NU1,NU3,ALPHA) - cdf of generalized
trapezoid distribution
LET A = GTRPDF(X,A,B,C,D,NU1,NU3,ALPHA) - pdf of generalized
trapezoid distribution
LET A = GTRPPF(P,A,B,C,D,NU1,NU3,ALPHA) - ppf of generalized
trapezoid distribution
LET A = FTCDF(X,NU) - cdf of folded t distribution
LET A = FTPDF(X,NU) - pdf of folded t distribution
LET A = FTPPF(P,NU) - ppf of folded t distribution
LET A = SNCDF(X,ALPHA) - cdf of skew normal distribution
LET A = SNPDF(X,ALPHA) - pdf of skew normal distribution
LET A = SNPPF(P,ALPHA) - ppf of skew normal distribution
LET A = STCDF(X,NU,ALPHA) - cdf of skew t distribution
LET A = STPDF(X,NU,ALPHA) - pdf of skew t distribution
LET A = STPPF(X,NU,ALPHA) - ppf of skew t distribution
LET A = SLACDF(X) - cdf of slash distribution
LET A = SLAPPF(P) - ppf of slash distribution
LET A = IBCDF(X,ALPHA,BETA) - cdf of inverted beta distribution
LET A = IBPPF(P,ALPHA,BETA) - ppf of inverted beta distribution
LET A = GHCDF(X,G,H) - cdf of g-and-h distribution
LET A = GHPPF(P,G,H) - ppf of g-and-h distribution
LET A = MAKCDF(X,XI,L,T) - cdf of Gompertz-Makeham distribution
LET A = MAKPDF(X,XI,L,T) - pdf of Gompertz-Makeham distribution
LET A = MAKPPF(P,XI,L,T) - ppf of Gompertz-Makeham distribution
LET A = GHPPF(P,G,H) - ppf of g-and-h distribution
LET A = ZIPPDF(X,ALPHA) - pdf of Zipf distribution
Note that the IBPDF and SLAPDF functions were implemented
previously. The GHPDF function is still under development.
You can generate random numbers for these distributions
with the commands
LET A =
LET B =
LET C =
LET D =
LET Y = TRAPEZOID RANDOM NUMBERS FOR I = 1 1 N
LET A =
LET B =
LET C =
LET D =
LET NU1 =
LET NU3 =
LET ALPHA =
LET Y = GENERALIZED TRAPEZOID RANDOM NUMBERS FOR I = 1 1 N
LET NU =
LET Y = FOLDED T RANDOM NUMBERS FOR I = 1 1 N
LET ALPHA =
LET Y = SKEWED NORMAL RANDOM NUMBERS FOR I = 1 1 N
LET NU =
LET ALPHA =
LET Y = SKEWED T RANDOM NUMBERS FOR I = 1 1 N
LET G =
LET H =
LET Y = G AND H RANDOM NUMBERS FOR I = 1 1 N
LET XI =
LET LAMBDA =
LET THETA =
LET Y = GOMPERTZ-MAKEHAM RANDOM NUMBERS FOR I = 1 1 N
LET ALPHA =
LET Y = ZIPF RANDOM NUMBERS FOR I = 1 1 N
Random numbers for the slash and inverted beta distributions
were added previously.
You can generate the following probability plots and goodness
of fit tests
LET A =
LET B =
LET C =
LET D =
TRAPEZOID PROBABILITY PLOT Y
TRAPEZOID KOLMOGOROV-SMIRNOV GOODNESS OF FIT TEST Y
TRAPEZOID CHI-SQUARE GOODNESS OF FIT TEST Y
LET A =
LET B =
LET C =
LET D =
LET NU1 =
LET NU3 =
LET ALPHA =
GENERALIZED TRAPEZOID PROBABILITY PLOT Y
GENERALIZED TRAPEZOID KOLMOGOROV-SMIRNOV GOODNESS OF FIT TEST Y
GENERALIZED TRAPEZOID CHI-SQUARE GOODNESS OF FIT TEST Y
LET NU =
FOLDED T PROBABILITY PLOT Y
FOLDED T KOLMOGOROV-SMIRNOV GOODNESS OF FIT TEST Y
FOLDED T CHI-SQUARE GOODNESS OF FIT TEST Y
FOLDED T PPCC PLOT Y
LET NU =
LET LAMBDA =
SKEW T PROBABILITY PLOT Y
SKEW T KOLMOGOROV-SMIRNOV GOODNESS OF FIT TEST Y
SKEW T CHI-SQUARE GOODNESS OF FIT TEST Y
SKEW T PPCC PLOT Y
LET LAMBDA =
SKEW NORMAL PROBABILITY PLOT Y
SKEW NORMAL KOLMOGOROV-SMIRNOV GOODNESS OF FIT TEST Y
SKEW NORMAL CHI-SQUARE GOODNESS OF FIT TEST Y
SKEW NORMAL PPCC PLOT Y
LET G =
LET H =
G AND H PROBABILITY PLOT Y
G AND H KOLMOGOROV-SMIRNOV GOODNESS OF FIT TEST Y
G AND H CHI-SQUARE GOODNESS OF FIT TEST Y
G AND H PPCC PLOT Y
LET XI =
LET LAMBDA =
LET THETA =
GOMPERTZ-MAKEHAM PROBABILITY PLOT Y
GOMPERTZ-MAKEHAM KOLMOGOROV-SMIRNOV GOODNESS OF FIT TEST Y
GOMPERTZ-MAKEHAM CHI-SQUARE GOODNESS OF FIT TEST Y
c) Added the following commands
JOHNSON SU MOMENTS Y
JOHNSON SB MOMENTS Y
to compute method of moment estimates for the Johnson SU
and Johnson SB distributions.
d) The GUMBEL MAXIMUM LIKELIHOOD command was extended to support
both the minimum and maximum cases (the previous version was
restricted to the maximum case). Before the GUMBEL MAXIMUM
LIKELIHOOD command, enter the command
SET MINMAX 1
to specify the minimum case and
SET MINMAX 2
to specify the maximum case.
e) Enter the following command to generate Dirichelet random numbers:
LET M = DIRICHLET RANDOM NUMBERS ALPHA N
with ALPHA denoting a vector containing the shape parameters of
the Dirichlet distribution and N denoting a scalar that specifies
the number of rows to generate. M will be a matrix with N rows
and k columns (where k is the number of elements in the ALPHA
vector).
You can also compute the Dirichlet probability density or the
log of the Dirichlet probability density with the commands
LET M = DIRICHLET PDF X ALPHA
LET M = DIRICHLET LOG PDF X ALPHA
f) Enter the following command to generate correlated uniform
random numbers:
LET U = MULTIVARIATE UNIFORM RANDOM NUMBERS SIGMA N
with SIGMA denoting the variance-covariance matrix of
a multivariate normal distribution and N denoting the number
of rows to generate.
g) The Anderson-Darling goodnes of fit test was enhanced to
include the following distributions:
ANDERSON-DARLING LOGISTIC TEST Y
ANDERSON-DARLING DOUBLE EXPONENTIAL TEST Y
ANDERSON-DARLING UNIFORM TEST Y
The uniform case is for the uniform distribution on the
(0,1) interval. This can also be used for fully specified
distributions (i.e., the shape, location, and scale
parameters are not estimated from the data). Simply
calculate the appropriate CDF function with the specified
shape, location, and scale parameters (this converts the
data to the (0,1) interval) and apply the test for a
uniform distribution.
h) The following maximum likelihood estimation commands were
added:
LOGISTIC MAXIMUM LIKELIHOOD Y
UNIFORM MAXIMUM LIKELIHOOD Y
BETA MAXIMUM LIKELIHOOD Y
The BETA and UNIFORM cases generate both method of moments and
maximum likelihood estimates.
The beta case estimates the lower and upper limits of the
data from the minimum and maximam data values, respectively,
and then computes the maximum likelihood estimates for the
alpha and beta shape parameters.
i) Support was added for the following random number
generators:
1) FIBONACCI CONGRUENTIAL - a mixture of the Fibonnaci generator
with a congruential generator
2) MERSENNE TWISTER - Fortran 90 implementation of the
Mersenned twister generator (may not be
valid on platforms that are compiled
with Fortran 77 compilers)
Enter HELP RANDOM NUMBER GENERATOR for details.
j) Fixed the inverse gaussian and reciprocal inverse gaussian
probability functions. The MU parameter was treated as a
location parameter in original implementation. However, it
is really a shape parameter. So IGPDF and RIGPDF can now be
called via
IGPDF(X,GAMMA,MU,LOC,SCALE)
RIGPDF(X,GAMMA,MU,LOC,SCALE)
The MU parameter is treated as an optional parameter (LOC and
SCALE are also optional). MU is set to 1 if it is omitted.
The MU parameter can also be specified for random numbers
and probability plots. If the MU parameter is not set, it
will automatically be set to 1 (no error message is printed).
The PPCC plot for these two distributions is now generated for
both the gamma and mu parameters (i.e., a 3D plot is generated).
If you want the PPCC pl |