7.
Product and Process Comparisons
7.2. Comparisons based on data from one process 7.2.6. What intervals contain a fixed percentage of the population values?


Tolerance intervals can be constructed for a distribution of any form  The methods on the previous pages for computing tolerance limits are based on the assumption that the measurements come from a normal distribution. If the distribution is not normal, tolerance intervals based on this assumption will not provide coverage for the intended proportion \(p\) of the population. However, there are methods for achieving the intended coverage if the form of the distribution is not known, but these methods may produce substantially wider tolerance intervals.  
Risks associated with making assumptions about the distribution  There are situations where it would be particularly dangerous to make unwarranted assumptions about the exact shape of the distribution, for example, when testing the strength of glass for airplane windshields where it is imperative that a very large proportion of the population fall within acceptable limits.  
Tolerance intervals based on largest and smallest observations 
One obvious choice for a twosided tolerance interval for an unknown
distribution is the interval between the smallest and largest observations
from a sample of \(Y_1, \, \ldots, \, Y_N\)
measurements. Given the sample size \(N\)
and coverage \(p\),
an equation from
Hahn and Meeker (p. 91),
$$ \gamma = 1  Np^{N1} + (N1)p^{N} \, , $$
allows us to calculate the confidence \(\gamma\)
of the tolerance interval. For example, the confidence levels for selected
coverages between 0.5 and 0.9999 are shown below for \(N\)
= 25.
Confidence Coverage 1.000 0.5000 0.993 0.7500 0.729 0.9000 0.358 0.9500 0.129 0.9750 0.026 0.9900 0.007 0.9950 0.0 0.9990 0.0 0.9995 0.0 0.9999Note that if 99 % confidence is required, the interval that covers the entire sample data set is guaranteed to achieve a coverage of only 75 % of the population values. 

What is the optimal sample size?  Another question of interest is, "How large should a sample be so that one can be assured with probability \(\gamma\) that the tolerance interval will contain at least a proportion \(p\) of the population?"  
Approximation for \(N\)  A rather good approximation for the required sample size is given by $$ N \cong \frac{1}{4} \frac{(1+p)}{(1p)} \, \chi^2_{\gamma, \, 4} +\frac{1}{2} \, , $$ where \(\chi_{\gamma, \, 4}^2\) is the critical value of the chisquare distribution with 4 degrees of freedom that is exceeded with probability \(\gamma\).  
Example of the effect of \(p\) on the sample size 
Suppose we want to know how many measurements to make in order to guarantee
that the interval between the smallest and largest observations covers a
proportion \(p\)
of the population with probability \(\gamma\)
= 0.95. From the
table for the upper critical value
of the chisquare distribution, look under the column labeled 0.95 in
the row for 4 degrees of freedom. The value is found to be
\(\chi_{0.95, \, 4}^2\)
= 9.488 and calculations are shown below for \(p\)
equal to 0.90 and 0.99.
For \(p\) = 0.90, \(\gamma\) = 0.95; $$ N \cong \frac{1}{4} \frac{(1+0.90)}{(10.90)} \, \chi^2_{0.95, 4} + \frac{1}{2} = 0.25(19)(9.488)+0.5 = 45.57 = 46 \, . $$ For \(p\) = 0.99, \(\gamma\) = 0.95; $$ N \cong \frac{1}{4} \frac{(1+0.99)}{(10.99)} \, \chi^2_{0.95, 4} +\frac{1}{2} = 0.25(199)(9.488)+0.5 = 472.5 = 473 \, . $$ These calculations demonstrate that requiring the tolerance interval to cover a very large proportion of the population may lead to an unacceptably large sample size. 