IT Performance: Human ID - Ranking Algorithms for Face Recognition
|
Introduction
|
The DARPA Human ID Project, as a major funder of ongoing U.S. research
in this area, serves as a magnet, mixing bowl, and ultimately test bed
provider for the rapidly evolving knowledge base and set of systems.
SED is tasked, by DARPA and the NIST-resident Human ID Group
(Jonathan Phillips and Patrick Grother of the
Information Access Division
of ITL), to help with the algorithm comparison test collection.
Background/Impetus
Customers
Goals
Impact
SED Milestones
R&D Team
Achievements
Presentations
|
Background/
Impetus
|
Identification and verification of a person's identity are two generic
application areas of face recognition systems. In identification
applications, an algorithm identifies an unknown face in an image by
searching through an electronic mugbook. In verification applications,
an algorithm confirms the claimed identity of a particular face.
Proposed applications have the potential to impact all aspects of
everyday life by controlling access to physical and information
facilities, confirming identities for legal and commercial transactions,
and controlling the flow of citizens at borders.
For face recognition systems to be successfully fielded,
one has to be able to evaluate their performance.
To evaluate an algorithm, its behavior is scored on a test
set of matchable images in a mugbook known as the
Gallery. One computes a similarity matrix
that quantifies the proximities of images of a subset
of the Gallery (called the Probe set) to each image in
the Gallery.
The task of SED is to develop methods for comparing algorithmic
performance based solely on comparison of similarity matrices
generated by the algorithms under test running against prefixed
Probe/Gallery sets. Since the algorithms are proprietary, comparison
methods may not presuppose detailed knowledge of any particular
algorithm.
Large collections of test images are already in existence
(FERET/Army Research Lab/ George Mason Univ./93-96) or currently
undergoing development (Human ID/DARPA/99-04). These databases
(which include IR, still, video, and hyperspectral images
of the face, gait, and iris of thousands of human subjects)
provide the Human ID research community with de facto database
standards for algorithm development and comparison.
A first, simple approach is to limit the comparisons
to replicated same-face match scores, transform the scores
from the multiple algorithm outputs to a common scale, and
examine ranking's and clustering's produced by application
of standard Multiple Comparisons procedures, e.g.,
Student-Newman-Keuls. A useful common scaling is achieved
by Probability Integral Transform-ing (PIT) each algorithm's scores
using knowledge of its characteristic score EDF based on
larger heterogeneous (FERET) experiments, then applying the inverse
Gaussian cumulative distribution. Application of this procedure to
a sizable extract of the FERET database yields a credible ranking of
15 algorithms dated 1996-1997.
An extension to a mixture of same-subject and different-subject match
scores can be achieved by use of ordinary 2-dimensional
MultiDimensional Scaling (MDS). MDS translates similarity matrices
into pictorial maps with matrix row/column headers converted into
mapped locations with appropriate inter-location distances. A good
algorithm should cluster same-subject images and cleanly discriminate
among different-subject images. The ability to discriminate, and
tightness of clusters as quantified, e.g., by circumscribed Voronoi
ellipse aspect ratios, can be used to rank algorithm performance.
Demonstration tests against small-scale FERET extracts show this
clearly.
While
-1-PIT
and use of Multiple Comparisons and MDS have the advantage of retaining
the ratio scale of the original similarity scores, much of the work
already published and currently being done in this area makes use of
rank statistics. We are exploring multiple properties and statistics
derived from the use of partial rank correlations (PRC). This involves
extending the known distributional theory for PRC's based on Kendall
and Spearman statistics and applying them to the study of interesting
dependency patterns among different algorithms. Loosely, the ID
community recognizes that most current algorithms perform most
reasonably in scoring true (close) matches and (far) dramatically
disparate non-matches: i.e., algorithms perform best at the far ends
of the performance scale. It is commonly presumed that enhanced
understanding of algorithmic performance (and the dual issue of image
difficulty) will come from pushing in at either end of the
match/nonmatch performance scale. The application of nonparametric
dependence via copula theory to partial rank co-occurrences
seems to hold promise for enhanced understanding here.
In addition, the team has performed rough draft work on All Possible
Subsets and Alternating Conditional Expectation (ACE) modeling
of covariates' explanatory power for FERET similarity scores,
as well as the application of simple Stochastic Matrix Ordering
techniques for ranking similarity matrices. This may lead to better
designs of image recognition studies.
|
|
Customers
|
The customers for the human id project include:
- DARPA, Information Access Division/NIST: Jonathan Phillips,
Charles Wilson, Patrick Grother
- Intelligence
- Security
- Defense
- Law enforcement
|
|
Goals
|
The principal research task of the human id project is:
Develop methods for comparing human face identification
algorithm performance based solely on comparison of similarity
matrices generated by the algorithms under controlled test runs.
No detailed knowledge of competing algorithms may be presupposed.
Tests based on existing FERET and DARPA developmental databases.
|
|
Impact
|
Of all the security technologies thrust into the spotlight by the
events of September 11th and the aftermath, systems that try to
identify people by analyzing computerized images of their faces are
among the most prominent, controversial, and potentially promising.
Demands for workable technologies have exploded, as have demands for
objective means to evaluate existing technologies and provide
guidance in constructively improving existing technologies.
|
|
|
Milestones for the human id project are:
|
|
FY03 Milestones
|
|
|
FY2002 Milestones
|
|
|
R&D Team
|
Andrew Ruhkin, Statistical Engineering Division, ITL
Alan Heckert,
Statistical Engineering Division, ITL
Jonathan Phillips, Information Access Division, ITL
Patrick Grother, Information Access Division, ITL
In addition, the SED students Kimball Kniskern, Susan Heath, and
Mariana Moody also contributed to this project.
|
|
Achievements
|
Ranking and clustering methods developed to date for the human id
project include:
- Restrict attention to same-image match scores (homogeneous
subsample). Coherently transform, or renormalize, similarity
scores across algorithms by Probability Integral Transform
followed by inverse cumulative Gaussian. Apply Multiple
Comparisons procedures to cluster and rank.
- Simple application of 2-D Multidimensional Scaling to
heterogeneous subsamples mixing same-image with cross-image
match scores. 1st criterion of goodness: ability of algorithm(s)
to discriminate among distinct human individuals. 2nd
criterion of goodness: tightness of same-face clusters,
quantifiable - e.g., - via aspect ratio of circumscribed
Voronoi ellipses.
- Partial Rank Correlations. This work involves extending the
known distributuional theory for PRC's based on Kendall and
Spearman statistics, and applying them to the study of dependency
patterns among algorithms.
- Draft work on APSR and ACE modeling of covariates for similarity
scores; stochastic matrix ordering techniques for algorithm
ranking; algorithm fusion.
|
|
Presentations
|
Presentations resulting from the human id project include:
|
Date created: 6/21/2002
Last updated: 2/12/2002
Please email comments on this WWW page to
sedwww@nist.gov.
|