7.
Product and Process Comparisons
7.2. Comparisons based on data from one process 7.2.2. Are the data consistent with the assumed process mean?


The computation of sample sizes depends on many things, some of which have to be assumed in advance 
Perhaps one of the most frequent questions asked of a statistician
is,


Application  estimating a minimum sample size, \(N\), for limiting the error in the estimate of the mean  For example, suppose that we wish to estimate the average daily yield, \(\mu\), of a chemical process by the mean of a sample, \(Y_1, \, \ldots, \, Y_N\), such that the error of estimation is less than \(\delta\) with a probability of 95 %. This means that a 95 % confidence interval centered at the sample mean should be $$ \bar{Y}  \delta \le \mu \le \bar{Y} + \delta \, , $$ and if the standard deviation is known, $$ \delta = \frac{\sigma}{\sqrt{N}} \, z_{1  0.025} \, . $$ The critical value from the normal distribution for 1  \(\alpha\) /2 = 0.975 is 1.96. Therefore, $$ N \ge \left( \frac{1.96}{\delta} \right)^2 \sigma^2 \, . $$  
Limitation and interpretation  A restriction is that the standard deviation must be known. Lacking an exact value for the standard deviation requires some accommodation, perhaps the best estimate available from a previous experiment.  
Controlling the risk of accepting a false hypothesis  To control the risk of accepting a false hypothesis, we set not only \(\alpha\), the probability of rejecting the null hypothesis when it is true, but also \(\beta\), the probability of accepting the null hypothesis when in fact the population mean is \(\mu + \delta\) where \(\delta\) is the difference or shift we want to detect.  
Standard deviation assumed to be known 
The minimum sample size, \(N\),
is shown below for two and onesided tests of hypotheses with \(\sigma\)
assumed to be known.
$$ \begin{eqnarray}
N = (z_{1\alpha/2} + z_{1\beta})^2 \left( \frac{\sigma}{\delta} \right)^2 \rightarrow twosided \,\, test \\
N = (z_{1\alpha} + z_{1\beta})^2 \left( \frac{\sigma}{\delta} \right)^2 \rightarrow onesided \,\, test\\
\end{eqnarray} $$
The quantities \(z_{1\alpha/2}\) and \(z_{1\beta}\)
are critical values from the
normal distribution.
Note that it is usual to state the shift, \(\delta\), in units of the standard deviation, thereby simplifying the calculation. 

Example where the shift is stated in terms of the standard deviation  For a onesided hypothesis test where we wish to detect an increase in the population mean of one standard deviation, the following information is required: \(\alpha\), the significance level of the test, and \(\beta\), the probability of failing to detect a shift of one standard deviation. For a test with \(\alpha\) = 0.05 and \(\beta\) = 0.10, the minimum sample size required for the test is $$ N = (1.645 + 1.282)^2 = 8.567 \approx 9 \, . $$  
More often we must compute the sample size with the population standard deviation being unknown 
The procedures for computing sample sizes when the standard deviation
is not known are similar to, but more complex, than when the standard
deviation is known. The formulation depends on the
t distribution where the minimum sample size is given by
$$ \begin{eqnarray}
N = (t_{1\alpha/2} + t_{1\beta})^2 \left( \frac{s}{\delta} \right)^2 \rightarrow twosided \,\, test \\
N = (t_{1\alpha} + t_{1\beta})^2 \left( \frac{s}{\delta} \right)^2 \rightarrow onesided \,\, test\\
\end{eqnarray} $$
The drawback is that critical values of the t distribution depend on known degrees of freedom, which in turn depend upon the sample size which we are trying to estimate. 

Iterate on the initial estimate using critical values from the \(t\) table  Therefore, the best procedure is to start with an intial estimate based on a sample standard deviation and iterate. Take the example discussed above where the the minimum sample size is computed to be \(N\) = 9. This estimate is low. Now use the formula above with degrees of freedom \(N\)  1 = 8 which gives a second estimate of $$ N = (1.860 + 1.397)^2 = 10.6 \approx 11 \, . $$ It is possible to apply another iteration using degrees of freedom 10, but in practice one iteration is usually sufficient. For the purpose of this example, results have been rounded to the closest integer; however, computer programs for finding critical values from the \(t\) distribution allow noninteger degrees of freedom.  
Table showing minimum sample sizes for a twosided test 
The table below gives sample sizes for a twosided test of hypothesis
that the mean is a given value, with the shift to be detected a
multiple of the standard deviation. For a onesided test at
significance level \(\alpha\),
look under
the value of 2\(\alpha\)
in column 1. Note that this table is based on the normal
approximation (i.e., the standard deviation is known).
