Next Page Previous Page Home Tools & Aids Search Handbook
7. Product and Process Comparisons
7.2. Comparisons based on data from one process
7.2.2. Are the data consistent with the assumed process mean?

7.2.2.2.

Sample sizes required

The computation of sample sizes depends on many things, some of which have to be assumed in advance Perhaps one of the most frequent questions asked of a statistician is,
    "How many measurements should be included in the sample?"
Unfortunately, there is no correct answer without additional information (or assumptions). The sample size required for an experiment designed to investigate the behavior of an unknown population mean will be influenced by the following:
  • value selected for alpha, the risk of rejecting a true hypothesis
  • value of beta, the risk of accepting a false null hypothesis when a particular value of the alternative hypothesis is true.
  • value of the population standard deviation.
Application - estimating a minimum sample size, N, for limiting the error in the estimate of the mean For example, suppose that we wish to estimate the average daily yield, mu, of a chemical process by the mean of a sample, Y1, ..., YN, such that the error of estimation is less than delta with a probability of 95%. This means that a 95% confidence interval centered at the sample mean should be
Ybar - delta <= mu <= Ybar + delta

and if the standard deviation is known,

delta = (sigma/sqrt(N))*z(1-.025)

The critical value from the normal distribution for 1-α/2 = 0.975 is 1.96. Therefore,

N >= (1.96/delta)**2*sigma**2
Limitation and interpretation A restriction is that the standard deviation must be known. Lacking an exact value for the standard deviation requires some accommodation, perhaps the best estimate available from a previous experiment.
Controlling the risk of accepting a false hypothesis To control the risk of accepting a false hypothesis, we set not only alpha, the probability of rejecting the null hypothesis when it is true, but also beta, the probability of accepting the null hypothesis when in fact the population mean is mu+delta where delta is the difference or shift we want to detect.
Standard deviation assumed to be known The minimum sample size, N, is shown below for two- and one-sided tests of hypotheses with sigma assumed to be known.

N = (z(1-alpha/2) + z(1-beta))**2*(sigma/delta)**2 for a two-sided test;
  N = (z(1-alpha) + z(1-beta))**2*(sigma/delta)**2 for a one-sided test;

The quantities z1-α/2 and z1-β are critical values from the normal distribution.

Note that it is usual to state the shift, delta, in units of the standard deviation, thereby simplifying the calculation.

Example where the shift is stated in terms of the standard deviation For a one-sided hypothesis test where we wish to detect an increase in the population mean of one standard deviation, the following information is required: alpha, the significance level of the test, and beta, the probability of failing to detect a shift of one standard deviation. For a test with alpha = 0.05 and beta = 0.10, the minimum sample size required for the test is

N = (1.645 + 1.282)2 = 8.567 ~ 9.
More often we must compute the sample size with the population standard deviation being unknown The procedures for computing sample sizes when the standard deviation is not known are similar to, but more complex, than when the standard deviation is known. The formulation depends on the t distribution where the minimum sample size is given by

N = (t(1-alpha/2) + t(1-beta))**2*(s/delta)**2 for a two-sided test,  
 N = (t(1-alpha) + t(1-beta))**2*(s/delta)**2 for a one-sided test

The drawback is that critical values of the t distribution depend on known degrees of freedom, which in turn depend upon the sample size which we are trying to estimate.

Iterate on the initial estimate using critical values from the t table Therefore, the best procedure is to start with an intial estimate based on a sample standard deviation and iterate. Take the example discussed above where the the minimum sample size is computed to be N = 9. This estimate is low. Now use the formula above with degrees of freedom N - 1 = 8 which gives a second estimate of

N = (1.860 + 1.397)2 = 10.6 ~11.

It is possible to apply another iteration using degrees of freedom 10, but in practice one iteration is usually sufficient. For the purpose of this example, results have been rounded to the closest integer; however, computer programs for finding critical values from the t distribution allow non-integer degrees of freedom.

Table showing minimum sample sizes for a two-sided test The table below gives sample sizes for a two-sided test of hypothesis that the mean is a given value, with the shift to be detected a multiple of the standard deviation. For a one-sided test at significance level alpha, look under the value of 2alpha in column 1. Note that this table is based on the normal approximation (i.e., the standard deviation is known).

Sample Size Table for Two-Sided Tests
alpha beta delta=.5*sigma delta=1.0*sigma delta=1.5*sigma

.01 .01 98 25 11
.01 .05 73 18 8
.01 .10 61 15 7
.01 .20 47 12 6
.01 .50 27 7 3
.05 .01 75 19 9
.05 .05 53 13 6
.05 .10 43 11 5
.05 .20 33 8 4
.05 .50 16 4 3
.10 .01 65 16 8
.10 .05 45 11 5
.10 .10 35 9 4
.10 .20 25 7 3
.10 .50 11 3 3
.20 .01 53 14 6
.20 .05 35 9 4
.20 .10 27 7 3
.20 .20 19 5 3
.20 .50 7 3 3
Home Tools & Aids Search Handbook Previous Page Next Page