7.
Product and Process Comparisons
7.2. Comparisons based on data from one process 7.2.2. Are the data consistent with the assumed process mean?


The computation of sample sizes depends on many things, some of which have to be assumed in advance 
Perhaps one of the most frequent questions asked of a statistician
is,


Application  estimating a minimum sample size, N, for limiting the error in the estimate of the mean 
For example, suppose that we wish to estimate the average daily
yield, , of a chemical
process by the mean of a sample,
Y_{1}, ..., Y_{N},
such that the error of estimation is less than
with a
probability of 95%. This means that a 95% confidence interval
centered at the sample mean should be
and if the standard deviation is known,
The critical value from the normal distribution for 1α/2 = 0.975 is 1.96. Therefore,


Limitation and interpretation  A restriction is that the standard deviation must be known. Lacking an exact value for the standard deviation requires some accommodation, perhaps the best estimate available from a previous experiment.  
Controlling the risk of accepting a false hypothesis  To control the risk of accepting a false hypothesis, we set not only , the probability of rejecting the null hypothesis when it is true, but also , the probability of accepting the null hypothesis when in fact the population mean is where is the difference or shift we want to detect.  
Standard deviation assumed to be known 
The minimum sample size, N, is shown below for two and
onesided tests of hypotheses with
assumed to
be known.
The quantities z_{1α/2} and z_{1β} are critical values from the normal distribution. Note that it is usual to state the shift, , in units of the standard deviation, thereby simplifying the calculation. 

Example where the shift is stated in terms of the standard deviation 
For a onesided hypothesis test where we wish to detect an increase
in the population mean of one standard deviation, the following
information is required:
,
the significance level of the test, and
, the
probability of failing to detect a shift of one standard deviation.
For a test with
= 0.05 and
= 0.10, the minimum sample size required for the test is


More often we must compute the sample size with the population standard deviation being unknown 
The procedures for computing sample sizes when the standard deviation
is not known are similar to, but more complex, than when the standard
deviation is known. The formulation depends on the
t distribution where the minimum sample size is given by
The drawback is that critical values of the t distribution depend on known degrees of freedom, which in turn depend upon the sample size which we are trying to estimate. 

Iterate on the initial estimate using critical values from the t table 
Therefore, the best procedure is to start with an intial estimate
based on a sample standard deviation and iterate. Take the example
discussed above where the the minimum sample size is computed to
be N = 9. This estimate is low. Now use the formula above
with degrees of freedom N  1 = 8 which gives a second
estimate of
It is possible to apply another iteration using degrees of freedom 10, but in practice one iteration is usually sufficient. For the purpose of this example, results have been rounded to the closest integer; however, computer programs for finding critical values from the t distribution allow noninteger degrees of freedom. 

Table showing minimum sample sizes for a twosided test 
The table below gives sample sizes for a twosided test of hypothesis
that the mean is a given value, with the shift to be detected a
multiple of the standard deviation. For a onesided test at
significance level
, look under
the value of 2
in column 1. Note that this table is based on the normal
approximation (i.e., the standard deviation is known).
