VECTOR PERCENTILE

Name:

VECTOR PERCENTILE (LET) Type:

Let Subcommand Purpose:

Generate a vector of percentiles from a response variable. Description:

The p-th percentile of a data set is defined as that value where p percent of the data is below that value and (1-p) percent of the data is above that value. For example, the 50th percentile is the median.

The default method for computing percentiles in Dataplot is based on the order statistic. The formula is:

\( \hat{X}_p = (1 - r)X_{NI1} + rX_{NI2} \)

where

X are the observations sorted in ascending order
NI1 = INT(p*(n+1))
NI2 = NI1 + 1
r = p*(n+1) - INT(p*(n+1))

If p is < 1/(n+1), then X₁ is returned. If p > n/(n+1), then X_N is returned.

The above is for a single percentile. For the VECTOR PERCENTILE command, you specify the number of percentiles that you would like to compute. Dataplot will then generate the appropriate values for p in the above formulas.

Syntax 1:

Examples:

Note:

The default method used by Dataplot described above is equivalent to method R6 of Hyndman and Fan. The description of the methods here will be in terms of the quantile q = p/100 where p is the desired percentile.

The method advocated by Hyndman and Fan is R8. For the R8 method,

\( X_{q} = X_{NI1} + r(X_{NI2} - X_{NI1}) \)

where

X are the observations sorted in ascending order
NI1 = INT(q*(n+(1/3)) + (1/3))
NI2 = NI1 + 1
r = q*(n+1) - INT(q*(n+1))

If q ≤ (2/3)/(n+(1/3)) the minimum value will be returned and if q ≥ (n-(1/3))/(n+(1/3)) the maximum value will be returned.

Method R7 (this is the default method in R and Excel) is calculated by

\( X_{q} = X_{NI1} + r(X_{NI2} - X_{NI1}) \)

where

X are the observations sorted in ascending order
NI1 = INT(q*(in-1) + 1)
NI2 = NI1 + 1
r = q*(n+1) - INT(q*(n+1))

If q = 1, then X_n is returned.

The R6, R7, and R8 methods give fairly similar, but not exactly the same (particularly for small samples), results. For most purposes, any of these three methods should be acceptable.

Note:

SET QUANTILE METHOD <ORDER/R6/R7/R8>

R6 is equivalent to ORDER. ORDER is the default.

Default:

The ORDER STATISTIC (R6) method is the default method for calculating percentiles. Synonyms:

None Related Commands:

PERCENTILE	= Compute a specified percentile.
QUANTILE	= Compute a specified quantile.
MEDIAN	= Compute the median.
LOWER QUARTILE	= Compute the lower quartile.
UPPER QUARTILE	= Compute the upper quartile.
FIRST DECILE	= Compute the first decile (the 10th percentile).

Reference:

The American Statistician

Applications:

Data Analysis Implementation Date:

2016/06 Program:

. Step 1:   Generate the raw data
.
let y = normal rand numb for i = 1 1 1000000
.
. Step 2:   Compute the desired percentiles
.
let nperc = 1000
let yperc = vector percentiles y nperc
.
. Step 3:   Plot the percentiles
.
character circle
character hw 1 0.75
character fill on
line blank
title Plot of 1,000 Percentiles Based on 1,000,000 Points
y1label Percentile Value
.
plot yperc