 7. Product and Process Comparisons
7.2. Comparisons based on data from one process
7.2.3. Are the data consistent with a nominal standard deviation?

## Sample sizes required

Sample sizes to minimize risk of false acceptance The following procedure for computing sample sizes for tests involving standard deviations follows W. Diamond (1989). The idea is to find a sample size that is large enough to guarantee that the risk, $$\beta$$, of accepting a false hypothesis is small.
Alternatives are specific departures from the null hypothesis This procedure is stated in terms of changes in the variance, not the standard deviation, which makes it somewhat difficult to interpret. Tests that are generally of interest are stated in terms of $$\delta$$, a discrepancy from the hypothesized variance. For example:
1. Is the true variance larger than its hypothesized value by $$\delta$$?
2. Is the true variance smaller than its hypothesized value by $$\delta$$?
That is, the tests of interest are:
1. $$H_0: \,\, \sigma^2 \ge \sigma_0^2 + \delta ; \,\,\, \delta \ge 0$$

2. $$H_0: \,\, \sigma^2 \le \sigma_0^2 - \delta ; \,\,\, \delta \ge 0$$.
Interpretation The experimenter wants to assure that the probability of erroneously accepting the null hypothesis of unchanged variance is at most $$\beta$$. The sample size, $$N$$, required for this type of detection depends on the factor, $$\delta$$; the significance level, $$\alpha$$; and the risk, $$\beta$$.
First choose the level of significance and beta risk The sample size is determined by first choosing appropriate values of $$\alpha$$ and $$\beta$$ and then following the directions below to find the degrees of freedom, $$\nu$$, from the chi-square distribution.
The calculations should be done by creating a table or spreadsheet First compute $$R = 1 + \frac{\delta}{\sigma^2_0} \, .$$ Then generate a table of degrees of freedom, $$\nu$$, say between 1 and 200. For case (1) or (2) above, calculate $$\beta_\nu$$ and the corresponding value of $$C_\nu$$ for each value of degrees of freedom in the table where $$\begin{eqnarray} \mbox{ 1. } & \beta_\nu & = \chi^2_{1-\alpha, \nu} / R \\ & & \\ & C_\nu & = \mbox{Pr }(\chi^2_\nu < \beta_\nu) \\ & & \\ \mbox{ 2. } & \beta_\nu & = \chi^2_{\alpha, \nu} / R \\ & & \\ & C_\nu & = \mbox{Pr }(\chi^2_\nu > \beta_\nu) \end{eqnarray}$$ The value of $$\nu$$ where $$C_\nu$$ is closest to $$\beta$$ is the correct degrees of freedom and $$N = \nu + 1 \, .$$
Hints on using software packages to do the calculations The quantity $$\chi_{1-\alpha, \, \nu}^2$$ is the critical value from the chi-square distribution with $$\nu$$ degrees of freedom which is exceeded with probability $$\alpha$$. It is sometimes referred to as the percent point function (PPF) or the inverse chi-square function. The probability that is evaluated to get $$C_\nu$$ is called the cumulative density function (CDF).
Example Consider the case where the variance for resistivity measurements on a lot of silicon wafers is claimed to be 100 (ohm.cm)2. A buyer is unwilling to accept a shipment if $$\delta$$ is greater than 55 ohm.cm for a particular lot. This problem falls under case (1) above. How many samples are needed to assure risks of $$\alpha$$ = 0.05 and $$\beta$$ = 0.01?
Calculations If software is available to compute the roots (or zero values) of a univariate function, then we can determine the sample size by finding the roots of a function that calculates $$C_\nu$$ for a given value of $$\nu$$. The procedure is:
1. Define constants. $$\begin{eqnarray} \alpha & = & 0.05 \\ \beta & = & 0.01 \\ \delta & = & 55 \\ \sigma_0^2 & = & 100 \\ R & = & 1 + \delta / \sigma_0^2 \end{eqnarray}$$
2. Create a function, $$C_\nu$$. $$C_\nu = F(F^{-1}(\alpha, \, \nu)/R, \, \nu) - \beta$$ $$F(x, \nu)$$ returns the probability of a chi-square random variable with $$\nu$$ degrees of freedom that is less than or equal to $$x$$ and $$F^{-1}(\alpha, \, \nu)$$ returns $$x$$ such that $$F(x, \, \nu) = \alpha$$.
3. Find the value of $$\nu$$ for which the function, $$C_\nu$$, is zero.
Using this procedure, $$C_\nu$$ is zero when $$\nu$$ is 169.3. Therefore, the minimum sample size needed to guarantee the risk level is $$N$$ = 170.

Alternatively, we can determine the sample size by simply printing computed values of $$C_\nu$$ for various values of $$\nu$$.

1. Define constants. $$\begin{eqnarray} \alpha & = & 0.05 \\ \delta & = & 55 \\ \sigma_0^2 & = & 100 \\ R & = & 1 + \delta / \sigma_0^2 \end{eqnarray}$$
2. Generate $$C_\nu$$ for values of $$\nu$$ from 1 to 200. $$\beta_\nu = F^{-1}(\alpha, \,\nu) / R$$ $$C_\nu = F(\beta_\nu, \, \nu)$$

The values of $$C_\nu$$ generated for $$\nu$$ between 165 and 175 degrees of freedom are shown below.

$$\nu$$               $$\beta_\nu$$               $$C_\nu$$

  165   126.4344   0.0114
166   127.1380   0.0110
167   127.8414   0.0107
168   128.5446   0.0104
169   129.2477   0.0101
170   129.9506   0.0098
171   130.6533   0.0095
172   131.3558   0.0092
173   132.0582   0.0090
174   132.7604   0.0087
175   133.4625   0.0085

The value of $$C_\nu$$ closest to 0.01 is 0.0101, which is associated with $$\nu$$ = 169 degrees of freedom. Therefore, the minimum sample size needed to guarantee the risk level is $$N$$ = 170.

The calculations used in this section can be performed using both Dataplot code and R code. 