
ZIPPDFName:
with and n denoting the shape parameters. Some sources parameterize this distribution with s =  1 (so that the distribution is defined for s > 0). The mean of the Zipf distribution is
The development of the Zipf distribution was motivated by Zipf's law (from the linguistics community). Zipf's law states that the frequency of occurence of any word is approximately inversely proportional to its rank in the frequency table. When Zipf's law is applicable, plotting the frequency table on a loglog scale (i.e., log(frequency) versus log(rank order)) will typically show a linear pattern. Note that Zipf's law is an empirical (as oppossed to a theoretical) law. However, Zipf's law has served as a useful model for many different kinds of phenomena.
<SUBSET/EXCEPT/FOR qualification> where <x> is a positive integer variable, number, or parameter; <alpha> is a number or parameter greater than 1 that specifies the first shape parameter; <n> is a number or parameter that is a positive integer that specifies the second shape parameter; <y> is a variable or a parameter where the computed Zipf pdf value is stored; and where the <SUBSET/EXCEPT/FOR qualification> is optional.
LET Y = ZIPPDF(X1,2.3,1000) PLOT ZIPPDF(X,2.3,100) FOR X = 1 1 100
LET N = <value> LET ALPHA = <value> LET Y = ZIPF RANDOM NUMBERS FOR I = 1 1 NLAST ZIPF PROBABILITY PLOT Y ZIPF PROBABILITY PLOT Y2 X2 ZIPF PROBABILITY PLOT Y3 XLOW XHIGH You can generate an estimate of alpha, assuming the value of n is known, based on the maximum ppcc value or the minimum chisquare goodness of fit with the commands
LET ALPHA1 = <value> LET ALPHA2 = <value> ZIPF KS PLOT Y ZIPF KS PLOT Y2 X2 ZIPF KS PLOT Y3 XLOW XHIGH ZIPF PPCC PLOT Y ZIPF PPCC PLOT Y2 X2 ZIPF PPCC PLOT Y3 XLOW XHIGH If the value of n is unknown, you can use the maximum data value as the estimate of n. The default values of ALPHA1 and ALPHA2 are 1.5 and 5, respectively. Due to the discrete nature of the percent point function for discrete distributions, the ppcc plot will not be smooth. For that reason, if there is sufficient sample size the KS PLOT (i.e., the minimum chisquare value) is typically preferred. Also, since the data is integer values, one of the binned forms is preferred for these commands. To generate a chisquare goodness of fit test, enter the commands
LET ALPHA = <value> ZIPF CHISQUARE GOODNESS OF FIT Y2 X2 ZIPF CHISQUARE GOODNESS OF FIT Y3 XLOW XHIGH
let n = 100 let alpha = 1.7 let y = zipf random numbers for i = 1 1 500 . let y3 xlow xhigh = integer frequency table y class lower 0.5 class width 1 let amax = maximum y let amax2 = amax + 0.5 class upper amax2 let y2 x2 = binned y . label case asis x1label Alpha y1label Minimum ChiSquare zipf ks plot y3 xlow xhigh let alpha = shape case asis justification center move 50 92 text Alpha = ^alpha, Minimum ChiSquare = ^minks zipf chisquare goodness of fit y3 xlow xhigh . title Histogram with Overlaid Zipf PDF label relative histogram y2 x2 limits freeze preerase off line color blue plot zippdf(x,alpha,n) for x = 1 1 n limits preerase on line color black CHISQUARED GOODNESSOFFIT TEST NULL HYPOTHESIS H0: DISTRIBUTION FITS THE DATA ALTERNATE HYPOTHESIS HA: DISTRIBUTION DOES NOT FIT THE DATA DISTRIBUTION: ZIPF SAMPLE: NUMBER OF OBSERVATIONS = 500 NUMBER OF NONEMPTY CELLS = 18 NUMBER OF PARAMETERS USED = 1 TEST: CHISQUARED TEST STATISTIC = 13.41658 DEGREES OF FREEDOM = 16 CHISQUARED CDF VALUE = 0.357910 ALPHA LEVEL CUTOFF CONCLUSION 10% 23.54183 ACCEPT H0 5% 26.29623 ACCEPT H0 1% 31.99993 ACCEPT H0 CELL NUMBER, LOWER BIN POINT, UPPER BIN POINT, OBSERVED FREQUENCY, AND EXPECTED FREQUENCY WRITTEN TO FILE DPST1F.DAT
Date created: 6/5/2006 