ITL Sponsors Rich Transcription Meeting Recognition Workshop

 

The Information Access Division (IAD) sponsored the Spring 2004 Rich Transcription Meeting Recognition Workshop at the annual IEEE International Conference on Acoustics, Speech, and Signal Process (ICASSP) on May 17, 2004 in Montreal, Canada.  The workshop was devoted to technologies relevant to the automatic recognition and extraction of information from meetings.  While the focus was largely on automatic speech recognition, the workshop also included discussion of video extraction and integration technologies in the meeting domain.  The workshop incorporated 23 technical papers/presentations in five areas: 1) the 2004 Meeting Recognition Evaluation, 2) data collection and transcription, 3) speech processing research, 4) related research, and 5) related programs.

 

In the first session, NIST reported the results of its recent community-wide evaluation of technologies for speech-to-text transcription (STT) and speaker diarization (identification of which speaker said which words) in the meeting domain. This was the second evaluation NIST sponsored in this domain.   The evaluation used a 90-minute multi-microphone test set comprised of eight 11-minute excerpts collected at Carnegie Mellon University (CMU), International Computer Science Institute (ICSI) at Berkley, the Linguistic Data Consortium, and NIST.  The evaluation included participation from several individual and team efforts: Laboratoire Informatique d Avignon(LIA)/Communication  Langagière et Interaction Personne-Système(CLIPS) in Grenoble (France), CMU/University of Karlsruhe (USA/Germany), Macquarie University (Australia), ICSI/SRI/University of Washington (USA), Panasonic (USA), and Virage (USA).  The results of the evaluation were quite promising; the performance for both the speaker diarization task and STT tasks were comparable to that of difficult conversational telephone speech.  Since the test contained both single and multiple distant microphone conditions, IAD evaluators were able to assess that multiple microphone approaches yield significantly better performance than single microphone approaches for both speaker diarization and STT technologies.  Further, IAD developed a new experimental algorithm to score overlapping speech (where two or more meeting participants are talking at the same time). The hypothesis was that this speech would be much more difficult to recognize than single speaker speech; the results of this scoring showed this to be empirically true. Each of the evaluation participants also reported on their research and test results in this session.

 

The second session included papers/presentations from each of the sites contributing test data for the evaluation.  The third and fourth sessions contained a number of invited papers and presentations on related research, and the final session contained papers and presentations on the new European Computers in the Human Interaction Loop (CHIL) and Augmented Multi-Party Interaction (AMI) programs, the American Video Analysis and Content Extraction (VACE) program, and knowledge discovery work at the Department of Defense.

 

A formal proceedings for the workshop will be made available later this year as a NIST Special Publication.  The workshop was very successful in bringing together a burgeoning new community of researchers and Government sponsors working in the meeting domain and plans are already underway for a follow-up evaluation and workshop series.

 

Contact:  John Garofolo,  ext. 3193