Product and Process Comparisons
7.2. Comparisons based on data from one process
7.2.2. Are the data consistent with the assumed process mean?
|The computation of sample sizes depends on many things, some of which have to be assumed in advance||
Perhaps one of the most frequent questions asked of a statistician
|Application - estimating a minimum sample size, N, for limiting the error in the estimate of the mean||
For example, suppose that we wish to estimate the average daily
yield, , of a chemical
process by the mean of a sample,
Y1, ..., YN,
such that the error of estimation is less than
probability of 95%. This means that a 95% confidence interval
centered at the sample mean should be
and if the standard deviation is known,
The critical value from the normal distribution for 1-α/2 = 0.975 is 1.96. Therefore,
|Limitation and interpretation||A restriction is that the standard deviation must be known. Lacking an exact value for the standard deviation requires some accommodation, perhaps the best estimate available from a previous experiment.|
|Controlling the risk of accepting a false hypothesis||To control the risk of accepting a false hypothesis, we set not only , the probability of rejecting the null hypothesis when it is true, but also , the probability of accepting the null hypothesis when in fact the population mean is where is the difference or shift we want to detect.|
|Standard deviation assumed to be known||
The minimum sample size, N, is shown below for two- and
one-sided tests of hypotheses with
The quantities z1-α/2 and z1-β are critical values from the normal distribution.
Note that it is usual to state the shift, , in units of the standard deviation, thereby simplifying the calculation.
|Example where the shift is stated in terms of the standard deviation||
For a one-sided hypothesis test where we wish to detect an increase
in the population mean of one standard deviation, the following
information is required:
the significance level of the test, and
probability of failing to detect a shift of one standard deviation.
For a test with
= 0.05 and
= 0.10, the minimum sample size required for the test is
|More often we must compute the sample size with the population standard deviation being unknown||
The procedures for computing sample sizes when the standard deviation
is not known are similar to, but more complex, than when the standard
deviation is known. The formulation depends on the
t distribution where the minimum sample size is given by
The drawback is that critical values of the t distribution depend on known degrees of freedom, which in turn depend upon the sample size which we are trying to estimate.
|Iterate on the initial estimate using critical values from the t table||
Therefore, the best procedure is to start with an intial estimate
based on a sample standard deviation and iterate. Take the example
discussed above where the the minimum sample size is computed to
be N = 9. This estimate is low. Now use the formula above
with degrees of freedom N - 1 = 8 which gives a second
It is possible to apply another iteration using degrees of freedom 10, but in practice one iteration is usually sufficient. For the purpose of this example, results have been rounded to the closest integer; however, computer programs for finding critical values from the t distribution allow non-integer degrees of freedom.
|Table showing minimum sample sizes for a two-sided test||
The table below gives sample sizes for a two-sided test of hypothesis
that the mean is a given value, with the shift to be detected a
multiple of the standard deviation. For a one-sided test at
, look under
the value of 2
in column 1. Note that this table is based on the normal
approximation (i.e., the standard deviation is known).