Dominic F. Vecchia, Jolene D. Splett
R. Michael McCabe
Bruce F. Field, Edward F. Kelley
To facilitate interoperability between existing ballistic imaging systems, the Office of the National Drug Control Policy, the Federal Bureau of Investigation, and the Bureau of Alcohol, Tobacco, and Firearms executed a memorandum of understanding recognizing that the two ballistic image systems currently in use be interoperable. Under this memorandum, NIST, as a neutral third party, was charged to develop a standard for interoperability and to develop and oversee interoperability conformance tests.
The purpose of ballistic imaging systems is to permit forensic evidence (cartridge cases and bullets) recovered at a crime scene to be imaged and compared to an existing database of thousands of images to identify possible links between crimes previously unsuspected as being related. However, due to differences in software, image acquisition, and networking capabilities, the images captured on either one of the two systems cannot be used on the other, thus denying crime laboratories full access to all image databases.
NIST has developed a specification for interoperability between the two image systems that requires the capability of creating cartridge-case images on either system in such a manner that the image can be correlated to the database on the dissimilar system. With respect to image acquisition and matching, the concept of interoperability is not a yes/no situation but rather a matter of degree of interoperability. That is, it is recognized that acquiring images on a non-native system may produce subtle differences in image ``quality" that could result in changes in match probability, relative to images acquired on the native system.
To address the problem of experimental evaluation of interoperability for image-matching, we have developed measures of disarray for comparing ranked ballistic images from a native database to the ranked images obtained by matching corresponding non-native test images. Initially, ordinary rank correlation coefficients were suggested for measuring and testing interoperability, but were thought to be inadequate since only the top few rank positions are believed to have any practical significance. So, we developed the statistical theory for two new rank coefficients which assign greater importance to the higher ranks. Furthermore, by contrast to common rank correlation procedures, these procedures are designed to measure the agreement of a novice ``judge" with a known standard ranking, rather than the mutual agreement between equally weighted rank vectors.
A limited interoperability test (LIT) was designed and conducted in 1998. The 61 first-firings from each of 61 handguns (9mm) were used to construct a reference database of images for the LIT. Test cartridge-cases (CCs) were the 61 second-firings from the same weapons. On each of the two imaging systems the measurement plan called for: (a) Two distinct images in native format on each of the 61 test CCs, and (b) a single image in foreign format on each of the 61 test CCs.
Agreement between duplicate images of a test CC is determined by a score designed to measure the degree of similarity in the two orderings of the 61 reference images near the top of the list. One of the duplicate-image correlations is regarded as a master ranking while the other order, or novice ranking, is scored for closeness to the master order in the top 1, 2, etc. positions. The resulting sequence of novice scores for matching 1 to 61 positions is reduced to a single score that emphasizes similarity to the master ranking in just the top few images. A similar score is calculated by reversing the master-novice roles of the two correlations, and the resulting two novice scores are averaged to produce a single score for agreement ``near the top" of the comparable image orders.
A practical definition of interoperability is that rankings from native and foreign images of the same CC are essentially indistinguishable, or interchangeable. In terms of the scores discussed above, interoperability would mean that native-to-native order scores should not be consistently better than comparable native-to-foreign order scores when making many such comparisons.
In the LIT, on both imaging systems there are 3 comparable images for each of the test CCs: 2 native images and 1 foreign image. By pairing images as (native-1, native-2), (native-1, foreign-1), and (native-2, foreign-1), we obtain 3 scores associated with repeated images of each of the test CCs. Assuming interoperability, we would expect about one-third (33.3%) of native-to-native image comparisons to produce scores that are better than scores for either of the 2 corresponding native-to-foreign image comparisons. The actual percent (and margin-of-error) of LIT tests where native-to-native scores are better provides an easily understood criterion for evaluating interoperability of a foreign imaging system with respect to the native system.
Date created: 7/20/2001