
ZETPDFName:
with denoting the shape parameter and denoting the Riemann zeta function
Some sources parameterize this distribution with s =  1 (so that the distribution is defined for s > 0). The zeta distribution becomes more longtailed as the value of gets closer to 1. The mean and variance of the Zeta distribution are
The development of the zeta distribution was motivated by Zipf's law (from the linguistics community). Zipf's law states that the frequency of occurence of any word is approximately inversely proportional to its rank in the frequency table. When Zipf's law is applicable, plotting the frequency table on a loglog scale (i.e., log(frequency) versus log(rank order)) should show a linear pattern. Note that Zipf's law is an empirical (as oppossed to a theoretical) law. However, Zipf's law has served as a useful model for many different kinds of phenomena (not just word counts).
<SUBSET/EXCEPT/FOR qualification> where <x> is a positive integer variable, number, or parameter; <alpha> is a number or parameter greater than 1 that specifies the shape parameter; <y> is a variable or a parameter where the computed zeta pdf value is stored; and where the <SUBSET/EXCEPT/FOR qualification> is optional.
LET Y = ZETPDF(X1,2.3) PLOT ZETPDF(X,2.3) FOR X = 1 1 50
LET ALPHA = <value> LET Y = ZETA RANDOM NUMBERS FOR I = 1 1 N ZETA PROBABILITY PLOT Y ZETA PROBABILITY PLOT Y2 X2 ZETA PROBABILITY PLOT Y3 XLOW XHIGH To obtain the maximum likelihood estimate of , enter one the commands (Y denotes raw data, Y2 denotes frequencies, and X2 denotes the class midpoints):
ZETA MAXIMUM LIKELIHOOD Y2 X2 The ZETA MAXIMUM LIKELIHOOD command will actually generate the following three numerical estimates of .
You can also generate an estimate of based on the maximum ppcc value or the minimum chisquare goodness of fit with the commands
LET ALPHA2 = <value> ZETA KS PLOT Y ZETA KS PLOT Y2 X2 ZETA KS PLOT Y3 XLOW XHIGH ZETA PPCC PLOT Y ZETA PPCC PLOT Y2 X2 ZETA PPCC PLOT Y3 XLOW XHIGH The default values of ALPHA1 and ALPHA2 are 1.5 and 5, respectively. Due to the discrete nature of the percent point function for discrete distributions, the ppcc plot will not be smooth. For that reason, if there is sufficient sample size the KS PLOT (i.e., the minimum chisquare value) is typically preferred. Also, since the data is integer values, one of the binned forms is preferred for these commands. To generate a chisquare goodness of fit test, enter the commands
ZETA CHISQUARE GOODNESS OF FIT Y2 X2 ZETA CHISQUARE GOODNESS OF FIT Y3 XLOW XHIGH
Devroye (1986), "NonUniform Random Variate Generation", SpringerVerlang, New York.
let alpha = 2.3 let y = zeta random numbers for i = 1 1 500 . let y3 xlow xhigh = integer frequency table y class lower 0.5 class width 1 let amax = maximum y let amax2 = amax + 0.5 class upper amax2 let y2 x2 = binned y . zeta mle y let alpha = alphaml zeta chisquare goodness of fit y3 xlow xhigh relative histogram y2 x2 limits freeze preerase off line color blue title Histogram with Overlaid Zeta cr() ... Alpha = ^alphaml, Minimum ChiSquare = ^statval plot zetpdf(x,alphaml) for x = 1 1 amax limits preerase on line color black . label case asis x1label Alpha y1label Minimum ChiSquare title Minimum ChiSquare Plot zeta ks plot y3 xlow xhigh let alpha = shape case asis justification center move 50 92 text Alpha = ^alpha, Minimum ChiSquare = ^minks zeta chisquare goodness of fit y3 xlow xhigh
ZETA PARAMETER ESTIMATION: SUMMARY STATISTICS: NUMBER OF OBSERVATIONS = 500 SAMPLE MEAN = 1.992000 SAMPLE STANDARD DEVIATION = 2.833371 SAMPLE MINIMUM = 1.000000 SAMPLE MAXIMUM = 30.00000 SAMPLE FIRST FREQUENCY = 0.6760000 SAMPLE SECOND FREQUENCY = 0.1600000 ESTIMATION BY FIRST TWO FREQUENCIES: ESTIMATE OF ALPHA = 2.078951 APPROXIMATE VARIANCE = 0.1379520E01 ESTIMATION BY FIRST MOMENT: ESTIMATE OF ALPHA = 2.481861 MAXIMUM LIKELIHOOD ESTIMATION: ESTIMATE OF ALPHA = 1.739179 APPROXIMATE VARIANCE = 0.1392758E02 ALPHAFR, ALPHAMOM, AND ALPHAML WILL BE SAVED AS INTERNAL PARAMETERS. CHISQUARED GOODNESSOFFIT TEST NULL HYPOTHESIS H0: DISTRIBUTION FITS THE DATA ALTERNATE HYPOTHESIS HA: DISTRIBUTION DOES NOT FIT THE DATA DISTRIBUTION: ZETA SAMPLE: NUMBER OF OBSERVATIONS = 500 NUMBER OF NONEMPTY CELLS = 9 NUMBER OF PARAMETERS USED = 1 TEST: CHISQUARED TEST STATISTIC = 65.86520 DEGREES OF FREEDOM = 7 CHISQUARED CDF VALUE = 1.000000 ALPHA LEVEL CUTOFF CONCLUSION 10% 12.01704 REJECT H0 5% 14.06714 REJECT H0 1% 18.47531 REJECT H0 CELL NUMBER, LOWER BIN POINT, UPPER BIN POINT, OBSERVED FREQUENCY, AND EXPECTED FREQUENCY WRITTEN TO FILE DPST1F.DAT CHISQUARED GOODNESSOFFIT TEST NULL HYPOTHESIS H0: DISTRIBUTION FITS THE DATA ALTERNATE HYPOTHESIS HA: DISTRIBUTION DOES NOT FIT THE DATA DISTRIBUTION: ZETA SAMPLE: NUMBER OF OBSERVATIONS = 500 NUMBER OF NONEMPTY CELLS = 9 NUMBER OF PARAMETERS USED = 1 TEST: CHISQUARED TEST STATISTIC = 5.979143 DEGREES OF FREEDOM = 7 CHISQUARED CDF VALUE = 0.457813 ALPHA LEVEL CUTOFF CONCLUSION 10% 12.01704 ACCEPT H0 5% 14.06714 ACCEPT H0 1% 18.47531 ACCEPT H0 CELL NUMBER, LOWER BIN POINT, UPPER BIN POINT, OBSERVED FREQUENCY, AND EXPECTED FREQUENCY WRITTEN TO FILE DPST1F.DAT
Date created: 6/5/2006 