************************************************************************

Work by ITL Researcher Suggests Performance Breakthrough in Speaker Recognition

 

ITL’s Information Access Division (IAD) hosted the 2001 Speaker Recognition Workshop, held May 14-15 in Linthicum, Maryland. This workshop reviewed the recently concluded 2001 Speaker Recognition Evaluation coordinated by IAD. Twelve academic and industrial research organizations participated; six from the U.S., three from France, and one each from Spain, India, and Australia. The evaluation covered several basic tasks involved in text-independent speaker recognition, and included eight different tests. Sites achieving the best scores on the tests were noted, although differences between competing systems were sometimes small. Mark Przybocki and Alvin Martin of IAD gave three presentations at the workshop analyzing performance results for different parts of the evaluation. Most of the data used in the evaluation were excerpts from the Switchboard Corpora of conversational telephone speech, generated at NIST by Mark Przybocki. At this Workshop, a new Switchboard Cellular Corpus was used, marking the first time that cellular telephone data has been used in a speaker recognition evaluation. Such cellular data will serve as primary data in the next evaluation.

A possible performance breakthrough was seen in systems based on results of preliminary work by George Doddington of IAD. Doddington had shown that much useful information for characterizing speakers could be found in longer-term speech characteristics, particularly the frequent usage of certain words or phrases. He showed that such "idiolectal" features of speech, obtainable from word transcripts, even errorful transcripts produced by automatic speech recognizers, could greatly enhance performance. For this he used test segments consisting of entire conversation sides (taken from conversations of five to ten minutes each) and training data for each speaker consisting of several, preferably at least eight, such conversation sides. To further explore the use of idiolectal characteristics of speakers in speaker recognition, a new extended data one-speaker detection task was included in this year’s evaluation. For this, systems were provided with much larger amounts of training and test data and with word transcripts generated by an automatic speech recognizer. Speaker detection performance was evaluated by measuring the correctness of detection decisions by the systems. These decision scores were used to produce error trade-off curves in order to see how misses may be traded-off against false alarms. Two sites, MIT-Lincoln Laboratory and R523 (DoD), produced systems for this evaluation based on Doddington's ideas. Their performance results were quite impressive, reducing previously seen error rates on such data by up to an order of magnitude. This is an exciting development that could have some significant applications and that brings together emerging speech recognition and speaker recognition technologies.

More information about the speaker recognition program is available on the Web at: http://www.nist.gov/speech/tests/spk/index.htm.

CONTACT: Alvin Martin, ext. 3169