%@LANGUAGE="JAVASCRIPT" CODEPAGE="65001"%>
|
|
|
|
1998 TREC-7 Spoken Document Retrieval TrackThis page contains information and links to files for the 1998 TREC-7 Spoken Document Retrieval (SDR) Track. Note that it will be updated periodically as new materials and information become available. Members of the SDR email list will be notified of updates. INSTRUCTIONS AND DOCUMENTATIONThe 1998 TREC-7 SDR Evaluation Specification Version 3 is the core document for the SDR Track and contains detailed information regarding participation, implementation, and schedule. If you intend to participate in the SDR Track, read this document first!These Background Notes are a supplement
to the Evaluation Specification and provide background information, terminology,
and rationale for the SDR Track. TRAINING MATERIALContact the LDC to obtain the training material. The textual material is available via LDC Order NumberLDC98E8. The speech material is available on 19 CD-ROMs via LDC Order Number LDC97S44.A set of 5 Training Topics with Relevance Assessments was created for a 50-hour subset of the training data (identified as "set1" in the above distribution.) See the Evaluation Specification for specifics
regarding training conditions before implementing the SDR Track. EVALUATION MATERIALThe recorded speech material for the SDR Track must be obtained from the LDC on 18 CD-ROMs via LDC Order Number LDC98S71. Use this GNUTAR ZIPPED archive of index files (one for each recorded speech file) to recognize ONLY the proper sections of the speech. The textual material including topics and baseline speech recognizer transcripts for the evaluation may be obtained via LDC Order Number LDC98E9.Read the Evaluation Specification in its entirety
before implementing the SDR Track evaluation. SOFTWAREThe following PERL5 scripts may be used in manipulating the SDR textual data:ctm2srt.pl Filter to create a Speech Recognizer Transcript (.srt) format file from a SCLITE Speech Recognition Scoring (.ctm) format file. srt2ctm.pl Filter to create a SCLITE Speech Recognition Scoring (.ctm) formatfile from an Speech Recognizer Transcript (.srt) format file. srt2ltt.pl Filter to create a Lexical TREC
Transcript (.ltt) format file from an Speech Recognizer Transcript (.srt) format
file. DATA LICENSINGNote that the Broadcast News recordings and transcriptions used as the spoken document collection in the 1998 TREC-7 SDR Track are licensed through the Linguistic Data Consortium (LDC) and are subject to usage restrictions. Contact the LDC for license agreement information. See the Evaluation Specification for more details.CONTACT INFORMATIONPlease direct all questions to speech_webmaster[at]nist.gov.
Page Created: August 17, 2007 |
|
Multimodal Information Group
is part of
IAD
and
ITL NIST is an agency of the U.S. Department of Commerce |
Privacy Policy |
Security Notices| Accessibility Statement | Disclaimer | FOIA |