SED navigation bar go to SED home page go to SED projects page go to NIST home page SED Home Page SED Contacts SED Projects SED Products and Publications Search SED Pages

IT Performance: Human ID - Ranking Algorithms for Face Recognition

Introduction The DARPA Human ID Project, as a major funder of ongoing U.S. research in this area, serves as a magnet, mixing bowl, and ultimately test bed provider for the rapidly evolving knowledge base and set of systems. SED is tasked, by DARPA and the NIST-resident Human ID Group (Jonathan Phillips and Patrick Grother of the Information Access Division of ITL), to help with the algorithm comparison test collection.

Background/Impetus
Customers
Goals
Impact
SED Milestones
R&D Team
Achievements
Presentations

Background/
Impetus
Identification and verification of a person's identity are two generic application areas of face recognition systems. In identification applications, an algorithm identifies an unknown face in an image by searching through an electronic mugbook. In verification applications, an algorithm confirms the claimed identity of a particular face. Proposed applications have the potential to impact all aspects of everyday life by controlling access to physical and information facilities, confirming identities for legal and commercial transactions, and controlling the flow of citizens at borders.

For face recognition systems to be successfully fielded, one has to be able to evaluate their performance. To evaluate an algorithm, its behavior is scored on a test set of matchable images in a mugbook known as the Gallery. One computes a similarity matrix that quantifies the proximities of images of a subset of the Gallery (called the Probe set) to each image in the Gallery.

The task of SED is to develop methods for comparing algorithmic performance based solely on comparison of similarity matrices generated by the algorithms under test running against prefixed Probe/Gallery sets. Since the algorithms are proprietary, comparison methods may not presuppose detailed knowledge of any particular algorithm.

Large collections of test images are already in existence (FERET/Army Research Lab/ George Mason Univ./93-96) or currently undergoing development (Human ID/DARPA/99-04). These databases (which include IR, still, video, and hyperspectral images of the face, gait, and iris of thousands of human subjects) provide the Human ID research community with de facto database standards for algorithm development and comparison.

A first, simple approach is to limit the comparisons to replicated same-face match scores, transform the scores from the multiple algorithm outputs to a common scale, and examine ranking's and clustering's produced by application of standard Multiple Comparisons procedures, e.g., Student-Newman-Keuls. A useful common scaling is achieved by Probability Integral Transform-ing (PIT) each algorithm's scores using knowledge of its characteristic score EDF based on larger heterogeneous (FERET) experiments, then applying the inverse Gaussian cumulative distribution. Application of this procedure to a sizable extract of the FERET database yields a credible ranking of 15 algorithms dated 1996-1997.

An extension to a mixture of same-subject and different-subject match scores can be achieved by use of ordinary 2-dimensional MultiDimensional Scaling (MDS). MDS translates similarity matrices into pictorial maps with matrix row/column headers converted into mapped locations with appropriate inter-location distances. A good algorithm should cluster same-subject images and cleanly discriminate among different-subject images. The ability to discriminate, and tightness of clusters as quantified, e.g., by circumscribed Voronoi ellipse aspect ratios, can be used to rank algorithm performance. Demonstration tests against small-scale FERET extracts show this clearly.

While Phi-1-PIT and use of Multiple Comparisons and MDS have the advantage of retaining the ratio scale of the original similarity scores, much of the work already published and currently being done in this area makes use of rank statistics. We are exploring multiple properties and statistics derived from the use of partial rank correlations (PRC). This involves extending the known distributional theory for PRC's based on Kendall and Spearman statistics and applying them to the study of interesting dependency patterns among different algorithms. Loosely, the ID community recognizes that most current algorithms perform most reasonably in scoring true (close) matches and (far) dramatically disparate non-matches: i.e., algorithms perform best at the far ends of the performance scale. It is commonly presumed that enhanced understanding of algorithmic performance (and the dual issue of image difficulty) will come from pushing in at either end of the match/nonmatch performance scale. The application of nonparametric dependence via copula theory to partial rank co-occurrences seems to hold promise for enhanced understanding here.

In addition, the team has performed rough draft work on All Possible Subsets and Alternating Conditional Expectation (ACE) modeling of covariates' explanatory power for FERET similarity scores, as well as the application of simple Stochastic Matrix Ordering techniques for ranking similarity matrices. This may lead to better designs of image recognition studies.

Customers The customers for the human id project include:
  • DARPA, Information Access Division/NIST: Jonathan Phillips, Charles Wilson, Patrick Grother
  • Intelligence
  • Security
  • Defense
  • Law enforcement
Goals The principal research task of the human id project is:
    Develop methods for comparing human face identification algorithm performance based solely on comparison of similarity matrices generated by the algorithms under controlled test runs. No detailed knowledge of competing algorithms may be presupposed. Tests based on existing FERET and DARPA developmental databases.
Impact Of all the security technologies thrust into the spotlight by the events of September 11th and the aftermath, systems that try to identify people by analyzing computerized images of their faces are among the most prominent, controversial, and potentially promising. Demands for workable technologies have exploded, as have demands for objective means to evaluate existing technologies and provide guidance in constructively improving existing technologies.
Milestones for the human id project are:
FY03 Milestones
  • To be determined
FY2002 Milestones
  • To be determined
R&D Team Andrew Ruhkin, Statistical Engineering Division, ITL

Alan Heckert, Statistical Engineering Division, ITL

Jonathan Phillips, Information Access Division, ITL

Patrick Grother, Information Access Division, ITL

In addition, the SED students Kimball Kniskern, Susan Heath, and Mariana Moody also contributed to this project.

Achievements Ranking and clustering methods developed to date for the human id project include:
  1. Restrict attention to same-image match scores (homogeneous subsample). Coherently transform, or renormalize, similarity scores across algorithms by Probability Integral Transform followed by inverse cumulative Gaussian. Apply Multiple Comparisons procedures to cluster and rank.

  2. Simple application of 2-D Multidimensional Scaling to heterogeneous subsamples mixing same-image with cross-image match scores. 1st criterion of goodness: ability of algorithm(s) to discriminate among distinct human individuals. 2nd criterion of goodness: tightness of same-face clusters, quantifiable - e.g., - via aspect ratio of circumscribed Voronoi ellipses.

  3. Partial Rank Correlations. This work involves extending the known distributuional theory for PRC's based on Kendall and Spearman statistics, and applying them to the study of dependency patterns among algorithms.

  4. Draft work on APSR and ACE modeling of covariates for similarity scores; stochastic matrix ordering techniques for algorithm ranking; algorithm fusion.
Presentations Presentations resulting from the human id project include:
  • Andrew Ruhkin, Patrick Grother, Jonathan Phillips, Stefan Leigh, Elaine Newton, Alan Heckert, "Dependence Characteristics of Face Recognition Algorithms", International Conference on Pattern Recognition, Quebec, Canada, August, 2002.

  • Stefan Leigh, Jonathan Phillips, Patrick Grother, Alan Heckert, Andrew Ruhkin, Elaine Newton, Mariama Moody, Kimball Kniskern, Susan Heath, "Transformation, Ranking, and Clustering for Face Recoginition Algorithm Comparison", Third Workshop on Automatic Identification Advanced Technologies, Tarrytown, March 2002.

    A copy of this talk is available in PDF format.


Date created: 6/21/2002
Last updated: 2/12/2002
Please email comments on this WWW page to sedwww@nist.gov.

SED Home |  IT Performance Home  |  Previous |  Next ]