Rich Transcription Spring 2004 Evaluation
This page contains information pertaining to the Rich Transcription Spring 2004 Evaluation (RT-04S) of Speech-to-Text Transcription and Speaker Segmentation in the Meeting Domain.
The Rich Transcription 2004 Spring Meeting Recognition evaluation plan is now available. Please let us know if you have any questions/concerns.
While any publicly available data can be used for training, we have worked with the LDC and CMU and ICSI to put together meeting domain training and development resources for the evaluation. This data consist of:
These data are currently available to RT-04S evaluation participants only. The LDC will make them available to the general public as they are able.
The NIST data has been "quick transcribed" and made available quickly so that it can be used several weeks prior to the evaluation. If possible, it will be re-released at the beginning of the evaluation with additional quality control. See this and the NIST Meeting Pilot Corpus websites for updates.
The 80-minute test set used in the RT-02 Meeting Recognition Evaluation is the designated development test data for the RT-04 Meeting Recognition Evaluation. NIST has re-released this data with additional distant mics (if the data collection sites provided them). Although this data is comprised of 10-minute excerpts from the same data collection sites which will be represented in the RT-04 evaluation test set, it is not completely reflective of the evaluation test data since it contains lapel mics in lieu of head mics for the LDC and CMU data and some different distant mics for LDC data. Unfortunately, because of resource constraints, we were unable to create an entirely new development test set for this evaluation.
This data is currently available to RT-04S evaluation participants only. The LDC will make them available to the general public as they are able. We will make more information and the scoring files for the Development Test Set available in RT-04 format as soon as we are able.
Note that some of the meetings in the development test set were included in the above training data releases. See the development test data documentation for the mapping of devtest meeting IDs to the original collection site meeting IDs so that these may be eliminated from your training sets.
The evaluation data will consist of an approximately 90-minute multi-site test set containing 8 meeting excerpts of approximately 11 minutes each. The test data was collected at CMU, ICSI, LDC, and NIST. Each meeting excerpt will contain a head-mic recording for each subject and one or more distant microphone recordings (whatever the data collection sites provided to NIST).
Reference transcripts for the evaluation excerpts will be prepared by the LDC according to its Careful Transcription Procedure for Meetings. These are similar to the procedures used to prepare test-quality reference transcripts of conversational telephone data. The reference transcripts will be processed by NIST into the STM format for SCLITE scoring.
This data will be made available to RT-04S evaluation participants only. It will be released sometime in the future (not prior to March 2005) by the LDC as development material for the next such evaluation.
ICASSP 2004 Meeting Recognition Workshop
Evaluation participants will have an automatic slot at the ICASSP 2004 Meeting Recognition Workshop in Montreal on May 17, 2004 and will be expected to contribute a paper and presentation for the workshop. See the ICASSP 2004 Meeting Recognition Workshop page for more details.
If you are interested in participating in the evaluation, workshop, or obtaining additional information, please contact us.
NIST RT-04 website comments and corrections should be emailed to our webmaster
Page Created: December 23, 2003
Multimodal Information Group
is part of
NIST is an agency of the U.S. Department of Commerce
Accessibility Statement | Disclaimer | FOIA