NIST Statement on Statistical Principles for the Design and
Analysis of Key Comparisons
Approved by the Measurement Services Advisory Group,
December 16, 2003
Introduction:
To facilitate international trade beneficial to U.S. industry, NIST
participates in international interlaboratory comparisons, called Key
Comparisons, to assess the equivalence of measurement standards used
at different National Metrology Institutes. Because Key Comparisons
impact both scientific and economic decisions made by different
countries, there are clearly defined procedures governing their
conduct. The primary document governing Key Comparisons is the Mutual
Recognition Arrangement (MRA) developed by the CIPM. In addition, NIST
has developed a "Position on the
Conduct of Key Comparisons," which offers guidance on
several interpretable points of the MRA for NIST participants in Key
Comparisons. Having participated in nearly 250 comparisons and
piloted nearly 70, NIST technical staff has asked for a clear
articulation of the statistical principles that are central to the
design, implementation, analysis and interpretation of Key Comparisons
and Supplementary Comparisons. Questions have arisen regarding the
following issues:
 What are the requirements in designing a Key Comparison to
assure a clear interpretation from the data once the
comparison is completed?
 What are the conditions for a statistical analysis of a Key
Comparison to be valid?
 When is the statistical analysis of a Key Comparison complete?
Is there a single statistical approach to the analysis of a Key
Comparison or to the estimation of a reference value (KCRV) or the
estimation of degrees of equivalence?
This NIST Statement identifies statistical principles for different
types of Key Comparisons that should be followed to ensure that the
comparisons in which NIST participates will be clearly interpretable.
Interpretability requires statistically sound estimates of the various
quantities of interest including reference values and degrees of
equivalence between measurement standards maintained by different
NMI's each with its associated uncertainties. Interpretation also
extends to the statistical basis for addressing unexplained
deviations, whether individual observations or the collective
observations from a particular NMI, and to statistically sound methods
for combining information from Key and Regional Comparisons in order
to address differences between NMIs participating in separate, but
linked, comparisons.
Information on sound statistical procedures and/or methodologies and
established statistical practices can be found in the archival and
applied journals of statistical societies, other technical and
educational publications and reputable statistical software in both
the commercial and public domains.
Statement of Principles:
The statistical premises for Key Comparisons
 Recognizing that there are both stochastic and nonstochastic
elements in all interlaboratory comparisons, the general goal
of the analysis of Key Comparison data is to draw statistical
inference. As a particular example, degrees of equivalence
among measurements and measurement standards for the various
NMIs with their associated uncertainties must be resolved on
the basis of sound statistical procedures.
 As expressed in Sections 6 and 9 of the Guidelines for Key
Comparisons and endorsed by the NIST Position Statement on
the Conduct of Key Comparisons, integrity of the data is
essential to the interpretability of a Key Comparison. Thus a
prerequisite to the inclusion of an NMI's data in the analysis
is the complete submission of data with attendant detailed
uncertainty budget. Similarly, according to the
Guidelines, the integrity of the Key Comparison analysis
is protected by explicit documentation of any changes to the
data (e.g., that may occur when the data is reviewed prior to
preparation of Draft A).
 Open accessibility of all data and uncertainty budgets permits
alternate or expanded analyses, as these may be appropriate
and may serve as additional validation of the conclusions.
The statistical design of Key Comparisons
 The statistical design of the Key Comparison should conform to
established principles of sound statistical design of
experiments. From the outset, a specific statistical analysis
should be posited to ensure that unbiased estimates of degrees
of equivalence and reference value and also of their associated
uncertainties will be possible; but the eventual analysis
should not be limited to this particular method. It is an
established statistical practice to construct the statistical
design both for (statistical) efficiency and for robustness due
to any loss of data or to effects of unforeseen factors.
 Since each Key Comparison involves a specific metrology, the
statistical design of the Key Comparison need to be
individualized to reflect the particular metrological
requirements and practical constraints of each comparison.
Statistical design features include among other things: factors
affecting the measurement process, artifact attributes,
replication and randomization (where feasible).
The statistical analysis of Key Comparisons
 Key Comparisons inherently involve statistical (Type A)
sources of uncertainty, nonstatistical (Type B) sources of
uncertainty, and mathematical constants not subject to
uncertainty. The statistical methodology must distinguish
among these, assigning the correct (different) mathematical
role to each. Statistical sources of uncertainty should be
measured from data and be verifiable from data.
Nonstatistical sources may be statements of individual expert
opinion not verifiable directly from data or may incorporate
both expert opinion and dataverifiable uncertainties such as
offsets.
 Because Key Comparisons are necessary in a wide variety of
metrological areas, no single statistical methodology can be
universally applied either to their design or to their
analysis. Therefore an appropriate statistical approach for
a Key Comparison will require individualization because of the
diversity of the measurement processes, the variety of
metrological models and the differences in the designs of the
Key Comparisons.
 In general, multiple statistical approaches are valid for a
Key Comparison. Every statistical approach requires a set of
underlying assumptions; for a particular approach to be valid
these assumptions must be stated and checked, wherever
possible. These assumptions include (but are not limited to)
the statistical models used, independence or interdependencies
in the data, and distributional assumptions about the data.
 Critical conclusions drawn from analysis of a Key Comparison
should hold generally under analysis by alternative
contextually valid statistical approaches. Divergence among
valid statistical approaches in the principal conclusions is an
indicator of insufficient information or a crucial dependence
upon assumptions that are not verifiable.
 For the purposes of a Key Comparison, the analysis of the
Key Comparison data is complete and adequate when it satisfies
two criteria: 1) a more elaborate analysis does not alter
conclusions drawn with respect to the primary objectives, and
2) the uncertainty associated with the summary results meet or
surpass the specific requirements for all primary uses of the
comparison. Statistical principles endorse more extensive or
more focused analyses for other purposes, for example: to shed
light on the measurement methodology or on the measurement
process for a particular NMI or subset of NMIs, or to resolve
deviations of individual observations or the collective data
from a single NMI or from a single measurement method.
