7. Product and Process Comparisons
7.2. Comparisons based on data from one process
7.2.5. What intervals contain a fixed percentage of the population values?

## Percentiles

Definitions of order statistics and ranks For a series of measurements Y1, ..., YN, denote the data ordered in increasing order of magnitude by Y[1], ..., Y[N]. These ordered data are called order statistics. If Y[j] is the order statistic that corresponds to the measurement Yi, then the rank for Yi is j; i.e.,

Definition of percentiles Order statistics provide a way of estimating proportions of the data that should fall above and below a given value, called a percentile. The pth percentile is a value, Y(p), such that at most (100p)% of the measurements are less than this value and at most 100(1- p)% are greater. The 50th percentile is called the median.

Percentiles split a set of ordered data into hundredths. (Deciles split ordered data into tenths). For example, 70% of the data should fall below the 70th percentile.

Estimation of percentiles Percentiles can be estimated from N measurements as follows: for the pth percentile, set p(N+1) equal to k + d for k an integer, and d, a fraction greater than or equal to 0 and less than 1.
1. For 0 < k < N

2. For k = 0Y(p) = Y[1]

3. For k = NY(p) = Y[N]
Example and interpretation For the purpose of illustration, twelve measurements from a gage study are shown below. The measurements are resistivities of silicon wafers measured in ohm.cm.
```       i  Measurements  Order stats   Ranks

1     95.1772     95.0610       9
2     95.1567     95.0925       6
3     95.1937     95.1065       10
4     95.1959     95.1195       11
5     95.1442     95.1442        5
6     95.0610     95.1567        1
7     95.1591     95.1591        7
8     95.1195     95.1682        4
9     95.1065     95.1772        3
10     95.0925     95.1937        2
11     95.1990     95.1959       12
12     95.1682     95.1990        8
```
To find the 90% percentile, p(N+1) = 0.9(13) =11.7; k = 11, and d = 0.7. From condition (1) above, Y(0.90) is estimated to be 95.1981 ohm.cm. This percentile, although it is an estimate from a small sample of resistivities measurements, gives an indication of the percentile for a population of resistivity measurements.
Note that there are other ways of calculating percentiles in common use Some software packages (EXCEL, for example) set 1+p(N-1) equal to k + d, then proceed as above. The two methods give fairly similar results.

A third way of calculating percentiles (given in some elementary textbooks) starts by calculating pN. If that is not an integer, round up to the next highest integer k and use Y[k] as the percentile estimate. If pN is an integer k, use .5(Y[k] +Y[k+1]).

Definition of Tolerance Interval An interval covering population percentiles can be interpreted as "covering a proportion p of the population with a level of confidence, say, 90%." This is known as a tolerance interval.