Keith R. Eberhardt, William F. Guthrie
In interlaboratory comparisons, two or more laboratories measure the same artifact to compare the relative biases of their measurement processes. One summary of interest is the pairwise difference, say X1-X2, between two laboratory's results, along with a confidence interval for the true difference. Since the labs have unequal variances, the confidence interval is usually computed by the Welch-Satterthwaite procedure, which approximates the distribution of the pivot statistic by a Student-t distribution with effective degrees of freedom defined as a particular function of the data.
In the course of analyzing the data for a major international interlaboratory comparison, an awkward and counterintuitive property of the Welch-Satterthwaite procedure was observed. Namely, the 95% confidence interval for a between-lab difference, , can be narrower than the corresponding 95% interval for one of the component results, say . Using the symbol Uto denote the half-width of a confidence interval, this condition is U1-2<U1. This occurs when has low degrees of freedom (say 1 or 2), and therefore a large Student-t multiplier for 95% confidence, while the effective degrees of freedom obtained from the Welch-Satterthwaite approximation is larger.
The typical reaction to this situation is to suspect the Welch-Satterthwaite procedure of failing to achieve the nominal 95% confidence level. However, this is not the correct explanation. In fact, situations exist where all three of the confidence intervals involved, for , , and for , achieve the stated 95% level of confidence, yet U1-2<U1.
The figure on the facing page illustrates a simulation study in which 10,000 sets of confidence intervals were computed for a situation where and the degrees of freedom were 1 and 4, for labs 1 and 2, respectively. These results show that for this situation, the coverages for all three intervals achieve the desired 95% confidence level. In the simulation, the counterintuitive condition U1-2<U1 occurs most of the time (in 86% of the simulations), as shown by the preponderance of points plotted below the diagonal in the figure. Even more surprising is that the (conditional) coverage of the interval for the difference gets worse when that interval is wider than the corresponding interval for alone. The simulation shows that the conditional coverage of the intervals for is only 91% when U1-2>U1, the condition that agrees with intuition, as compared to 96.2% when the counterintuitive condition holds.
The Bayesian approach to this problem leads to using the Behrens-Fisher distribution to obtain a 95% uncertainty interval for . Since it can be shown that the Behrens-Fisher distribution does not exhibit the counterintuitive property described above, this fact may help convince physical scientists to make more use of Bayesian methods.
Figure 16: Comparison of half-widths of 95% confidence intervals for one mean, U1, and for the difference of two means, U1-2. Results for which the interval for fails to cover the true value are shown in red. The preponderance of points below the diagonal illustrates that, for the situation studied, the most common outcome yields the counterintuitive condition that U1-2<U1. Further, the location of the red points in the figure illustrates that the interval for the difference is relatively more likely to fail (to cover the true value) when the outcome falls above the diagonal, i.e. when the uncertainties are more consistent with intuition.
Date created: 7/20/2001