NIST Position Statement on the Conduct of Key Comparisons
Approved by the Measurement Services Advisory Group,
December 16, 2003
Introduction:
To facilitate international trade beneficial to U.S. industry, NIST
participates in international interlaboratory comparisons, called
Key Comparisons (KCs), to assess the degree of equivalence of
measurement standards used at different National Metrology Institutes
(NMIs). Having participated in nearly 250 comparisons and piloted
nearly 70, NIST technical staff has asked for the development of a
NISTwide position regarding several interpretable points of the
CIPM MRA. Questions have arisen regarding the following issues:
 How definitive with regard to the intended statistical
analysis should the design protocol of a comparison be?
 Is there any leeway in changing results after they have been
submitted for Draft A of the comparison report?
 What is the status of the KCRV and what constitutes an
"exceptional" case wherein the calculation of a KCRV does
not make technical sense?
 Is a single approach to the KCRV estimation required for
all comparisons?
 What is the relationship between the approval process for a
Key Comparison Final Report and the review of relevant CMCs?
This NIST Position Statement seeks to answer these questions.
Recognizing that there are both stochastic and nonstochastic
elements in all interlaboratory comparisons, a companion document,
"NIST Statement on Statistical
Principles for the Design and Analysis of Key Comparisons,"
also gives guidelines to ensure that the design and analysis of Key
Comparisons in which NIST participates will yield clearly
interpretable estimates of the difference between measurements and
standards of the NMIs and the associated uncertainties of these
differences.
Statement of Position:
The conduct of key comparisons
 The document, Guidelines for Key Comparisons, is not
part of the CIPM MRA. However, Section T.6 of the MRA calls for
use of the Guidelines when carrying out Key Comparisons.
Key comparisons will be carried out according to the
Guidelines with the following specific references:
 NIST specifically endorses Section 6, which states that
the technical protocol will include a "list of the
principal components of the uncertainty budget to be
evaluated by each participant, and any necessary advice
on how uncertainties are estimated." Moreover, the
technical protocol should include both a statistical
design and the intended approach to the statistical
analysis of results.
 NIST specifically endorses Section 9, which defines how
the integrity of the results is to be maintained through
the development of Drafts A, B, and Final versions of
the Key Comparison Report. Before Draft A of the report
is prepared, each participant must submit its result. As
stated in the Guidelines, a "result from a
participant is not considered complete without an
associated uncertainty, and is not included in the draft
report unless it is accompanied by an uncertainty
supported by a complete uncertainty budget.
Uncertainties are drawn up following the guidance given
in the technical protocol." Before Draft A of the report
is developed, apparent anomalies can be reported to
the relevant participants. The corresponding institutes
are invited to check their results for numerical errors
but without being informed as to the magnitude or sign
of the apparent anomaly. If no numerical error is found
the result stands and the complete set of results is
sent to all participants. Note that once all
participants have been informed of the results,
individual values and uncertainties may be changed or
removed, or the complete comparison abandoned, only
with the agreement of all participants and on the basis
of a clear failure of the traveling standard or some
other phenomenon that renders the comparison or part of
it invalid.
 Although the Guidelines implies that every Key
Comparison must have a Key Comparison Reference Value,
NIST specifically endorses the language in the CIPM MRA,
Section T.3, which states that "in some exceptional
cases a Consultative Committee may conclude that for
technical reasons a reference value for a particular key
comparison is not appropriate." If experts in the
comparison working group agree that such technical
reasons exist, then the exceptional case exists and no
KCRV is calculated. In this case, "the results are then
expressed directly in terms of the degrees of
equivalence between pairs of standards."
 Section T.3 of the CIPM MRA also states that "although
a key comparison reference value is normally a close
approximation to the corresponding SI value, it is
possible that some of the values submitted by individual
participants may be even closer. In a few instances, for
example in some chemical measurements, there may be
difficulty in relating results to the SI. Nevertheless,
the key comparison reference value and deviations from
it are good indicators of the SI value." There are
completed KCs that for one reason or another have KCRVs
that do not fulfill this characterization. For instance,
when a KC transfer standard drifts, the drift is modeled
and participant results are adjusted according to this
model, the KCRV may have no relation to a corresponding
SI value (cf. CCEMK4). NIST recognizes that such KCRVs
have no intrinsic value other than as a convenient
summary of the ensemble of those specific KC results.
The analysis of key comparison results
 Key comparisons necessarily involve different designs for the
wide variety of scientific areas of metrology for which these
are conducted. Moreover for different measurement methods, the
sources of uncertainty, let alone their quantitative estimates,
will be different among different KCs; this may be true even
within a single KC, when a variety of measurement methods are
employed. NIST notes that Section 6 of the "Guidelines for
Key Comparisons" asserts "that the purpose of a key
comparison is to compare the standards as realized in the
participating institutes, not to require each participant to
adopt precisely the same conditions of realization. The
protocol should, therefore, specify the procedures necessary
for the comparison, but not the procedures used for the
realization of the standards being compared." Consequently, it
is NIST position that a single approach to developing summary
statistics for all KCs cannot be adopted.
The interpretation of key comparison results
 Key comparison results are intended to support the statements
of Calibration and Measurement capabilities (CMCs) as listed
in Appendix C of the CIPM MRA. Degrees of equivalence derived
from the analysis of KC results should be consistent with the
uncertainties listed in participants' CMCs. However, KC
protocols may not exactly match the conditions of a
participant's calibration or measurement service delivery. Key
Comparisons necessarily involve transfer standards, which may
introduce components of uncertainty unique to the KC.
Therefore, degrees of equivalence developed from a KC may in
fact be larger than a participant's uncertainties associated
with relevant CMCs without automatically invalidating those
CMCs.
 It is NIST position to follow the recommendation in JCRB
Document 9/12 (Revised 4 October 2002) that it is the ongoing
responsibility of the Working Group on CMCs within each
Consultative Committee to monitor the results of key and
supplementary comparisons and provide a written report to the
JCRB in the case that these results appear to contradict
published CMCs. The relevant Regional Metrology Organization
(RMO) representative to the JCRB transmits this report as
appropriate within the RMO. It is the responsibility of the
NMI providing the CMCs to notify the KCDB Coordinator in order
to undertake appropriate action. Such action may involve
increasing the uncertainties of CMCs or withdrawing CMCs. The
relevant RMO will keep the JCRB informed of the status of such
CMCs. Furthermore, it is NIST position that the process of
review and publication of a KC Final Report should not be
delayed in any way because of questions related to CMCs.
NIST Statement on Statistical Principles for the Design and
Analysis of Key Comparisons
Introduction:
To facilitate international trade beneficial to U.S. industry, NIST
participates in international interlaboratory comparisons, called Key
Comparisons, to assess the equivalence of measurement standards used
at different National Metrology Institutes. Because Key Comparisons
impact both scientific and economic decisions made by different
countries, there are clearly defined procedures governing their
conduct. The primary document governing Key Comparisons is the Mutual
Recognition Arrangement (MRA) developed by the CIPM. In addition, NIST
has developed a "Position on the
Conduct of Key Comparisons," which offers guidance on
several interpretable points of the MRA for NIST participants in Key
Comparisons. Having participated in nearly 250 comparisons and
piloted nearly 70, NIST technical staff has asked for a clear
articulation of the statistical principles that are central to the
design, implementation, analysis and interpretation of Key Comparisons
and Supplementary Comparisons. Questions have arisen regarding the
following issues:
 What are the requirements in designing a Key Comparison to
assure a clear interpretation from the data once the
comparison is completed?
 What are the conditions for a statistical analysis of a Key
Comparison to be valid?
 When is the statistical analysis of a Key Comparison complete?
Is there a single statistical approach to the analysis of a Key
Comparison or to the estimation of a reference value (KCRV) or the
estimation of degrees of equivalence?
This NIST Statement identifies statistical principles for different
types of Key Comparisons that should be followed to ensure that the
comparisons in which NIST participates will be clearly interpretable.
Interpretability requires statistically sound estimates of the various
quantities of interest including reference values and degrees of
equivalence between measurement standards maintained by different
NMI's each with its associated uncertainties. Interpretation also
extends to the statistical basis for addressing unexplained
deviations, whether individual observations or the collective
observations from a particular NMI, and to statistically sound methods
for combining information from Key and Regional Comparisons in order
to address differences between NMIs participating in separate, but
linked, comparisons.
Information on sound statistical procedures and/or methodologies and
established statistical practices can be found in the archival and
applied journals of statistical societies, other technical and
educational publications and reputable statistical software in both
the commercial and public domains.
Statement of Principles:
The statistical premises for Key Comparisons
 Recognizing that there are both stochastic and nonstochastic
elements in all interlaboratory comparisons, the general goal
of the analysis of Key Comparison data is to draw statistical
inference. As a particular example, degrees of equivalence
among measurements and measurement standards for the various
NMIs with their associated uncertainties must be resolved on
the basis of sound statistical procedures.
 As expressed in Sections 6 and 9 of the Guidelines for Key
Comparisons and endorsed by the NIST Position Statement on
the Conduct of Key Comparisons, integrity of the data is
essential to the interpretability of a Key Comparison. Thus a
prerequisite to the inclusion of an NMI's data in the analysis
is the complete submission of data with attendant detailed
uncertainty budget. Similarly, according to the
Guidelines, the integrity of the Key Comparison analysis
is protected by explicit documentation of any changes to the
data (e.g., that may occur when the data is reviewed prior to
preparation of Draft A).
 Open accessibility of all data and uncertainty budgets permits
alternate or expanded analyses, as these may be appropriate
and may serve as additional validation of the conclusions.
The statistical design of Key Comparisons
 The statistical design of the Key Comparison should conform to
established principles of sound statistical design of
experiments. From the outset, a specific statistical analysis
should be posited to ensure that unbiased estimates of degrees
of equivalence and reference value and also of their associated
uncertainties will be possible; but the eventual analysis
should not be limited to this particular method. It is an
established statistical practice to construct the statistical
design both for (statistical) efficiency and for robustness due
to any loss of data or to effects of unforeseen factors.
 Since each Key Comparison involves a specific metrology, the
statistical design of the Key Comparison need to be
individualized to reflect the particular metrological
requirements and practical constraints of each comparison.
Statistical design features include among other things: factors
affecting the measurement process, artifact attributes,
replication and randomization (where feasible).
The statistical analysis of Key Comparisons
 Key Comparisons inherently involve statistical (Type A)
sources of uncertainty, nonstatistical (Type B) sources of
uncertainty, and mathematical constants not subject to
uncertainty. The statistical methodology must distinguish
among these, assigning the correct (different) mathematical
role to each. Statistical sources of uncertainty should be
measured from data and be verifiable from data.
Nonstatistical sources may be statements of individual expert
opinion not verifiable directly from data or may incorporate
both expert opinion and dataverifiable uncertainties such as
offsets.
 Because Key Comparisons are necessary in a wide variety of
metrological areas, no single statistical methodology can be
universally applied either to their design or to their
analysis. Therefore an appropriate statistical approach for
a Key Comparison will require individualization because of the
diversity of the measurement processes, the variety of
metrological models and the differences in the designs of the
Key Comparisons.
 In general, multiple statistical approaches are valid for a
Key Comparison. Every statistical approach requires a set of
underlying assumptions; for a particular approach to be valid
these assumptions must be stated and checked, wherever
possible. These assumptions include (but are not limited to)
the statistical models used, independence or interdependencies
in the data, and distributional assumptions about the data.
 Critical conclusions drawn from analysis of a Key Comparison
should hold generally under analysis by alternative
contextually valid statistical approaches. Divergence among
valid statistical approaches in the principal conclusions is an
indicator of insufficient information or a crucial dependence
upon assumptions that are not verifiable.
 For the purposes of a Key Comparison, the analysis of the
Key Comparison data is complete and adequate when it satisfies
two criteria: 1) a more elaborate analysis does not alter
conclusions drawn with respect to the primary objectives, and
2) the uncertainty associated with the summary results meet or
surpass the specific requirements for all primary uses of the
comparison. Statistical principles endorse more extensive or
more focused analyses for other purposes, for example: to shed
light on the measurement methodology or on the measurement
process for a particular NMI or subset of NMIs, or to resolve
deviations of individual observations or the collective data
from a single NMI or from a single measurement method.
