%@LANGUAGE="JAVASCRIPT" CODEPAGE="65001"%>
|
|
|
|
2000 TREC-9 Spoken Document Retrieval TrackThis page contains information and links to files for the 2000 TREC-9 Spoken Document Retrieval (SDR) Track. Note that it will be updated periodically as new materials and information become available. Members of the SDR email list will be notified of updates. BACKGROUNDThis website is dedicated to the 2000 TREC-9 Spoken Document Retrieval (SDR) Track which implements an evaluation of retrieval of broadcast news excerpts using a combination of automatic speech recognition and information retrieval technologies.INSTRUCTIONS AND DOCUMENTATIONThe 2000 TREC-9 SDR Evaluation Specification Version 1.0 is the core document for the SDR Track and contains detailed information regarding participation, implementation, and schedule. If you intend to participate in the SDR Track, read this document first! To encourage the exploration of the use of automatically-extracted non-lexical information from the audio signal, we have developed a specification for the common exchange of such information to support experiments in the SDR Track. As such, we are providing a Segmentation Detection Table (.sdt) format (HTML File) which specifies a standard for the exchange of this type of data. Sites contributing segmentation data to the SDR Track must format their data accordingly. Likewise, sites wishing to use contributed segementation data should tool their systems to input this format. For more information concerning past SDR tracks see the TREC proceedings and the summary, The TREC Spoken Document Retrieval Track : A Success Story (Garofolo et al, April 2000). Only one baseline recognizer is provided this year and will be designated, "B1". Note that this recognizer is the same as the 1999 "B2" recognizer. Details regarding this recognizer are given in the paper Automatic Language Model Adaptation for Spoken Document Retrieval, (Auzanne et al, April 2000). SCHEDULEThe following is the schedule for the SDR Track :
TRAINING RESOURCESNo particular training collection is specified or provided for this track. However, below are some resources available from the LDC for recognition and retrieval training. Please see the Evaluation Specification for rules governing the use of these materials.Text ResourcesAn LDC compilation of text resources is available for recognition and retrieval training.Speech Resources1998 Hub-4 training data may be used for SDR training.IR resources from Previous TestsTopics and assessments used in previous SDR tests can be used for training.TEST RESOURCESSpeech recognition taskThe Evaluation Specification provides the rules and instructions for implementing the SDR track. The following resources are provided for sites implementing the speech recognition portion of the full SDR task:
Information retrieval taskThe Evaluation Specification provides the rules and instructions for implementing the SDR track. The following resources are provided for sites implementing the retrieval portion of the SDR task:
TEST RESULTSThe results of the TREC-9 2000 SDR evaluation presented at TREC on November 14, 2000 showed that retrieval performance for sites on their own recognizer transcripts was virtually the same as their performance on the human reference transcripts. Therefore, retrieval of excerpts from broadcast news using automatic speech recognition for transcription was deemed to be a solved problem - even with word error rates of 30%. DATA LICENSINGNote that the Broadcast News recordings and transcriptions used as the spoken document collection in the 2000 TREC-9 SDR Track are licensed through the Linguistic Data Consortium (LDC) and are subject to usage restrictions. Contact the LDC for license agreement information. See the Data Licensing and Costs section in the Evaluation Specification for more details.CONTACT INFORMATIONIf you would like to sign up for the SDR track or any others TREC tracks, please register per the instructions on the TREC website.If you have questions regarding the SDR data and protocols, contact
speech_webmaster[at]nist.gov.
Page Created: August 17, 2007 |
|
Multimodal Information Group
is part of
IAD
and
ITL NIST is an agency of the U.S. Department of Commerce |
Privacy Policy |
Security Notices| Accessibility Statement | Disclaimer | FOIA |