<%@LANGUAGE="JAVASCRIPT" CODEPAGE="65001"%> NIST Speech Group Website
Information Technology Lab, Information Access Division NIST: National Institute of Standards and Technology


  • Multimodal Information Group Home
  • Benchmark Tests
  • Tools
  • Test Beds
  • Publications
  • Links
  • Contacts
  • 1998 TREC-7 Spoken Document Retrieval Track

    This page contains information and links to files for the 1998 TREC-7 Spoken Document Retrieval (SDR) Track. Note that it will be updated periodically as new materials and information become available. Members of the SDR email list will be notified of updates.

    INSTRUCTIONS AND DOCUMENTATION

    The 1998 TREC-7 SDR Evaluation Specification Version 3 is the core document for the SDR Track and contains detailed information regarding participation, implementation, and schedule. If you intend to participate in the SDR Track, read this document first!

    These Background Notes are a supplement to the Evaluation Specification and provide background information, terminology, and rationale for the SDR Track.

    TRAINING MATERIAL

    Contact the LDC to obtain the training material. The textual material is available via LDC Order NumberLDC98E8. The speech material is available on 19 CD-ROMs via LDC Order Number LDC97S44.

    A set of 5 Training Topics with Relevance Assessments was created for a 50-hour subset of the training data (identified as "set1" in the above distribution.)

    See the Evaluation Specification for specifics regarding training conditions before implementing the SDR Track.

    EVALUATION MATERIAL

    The recorded speech material for the SDR Track must be obtained from the LDC on 18 CD-ROMs via LDC Order Number LDC98S71. Use this GNUTAR ZIPPED archive of index files (one for each recorded speech file) to recognize ONLY the proper sections of the speech. The textual material including topics and baseline speech recognizer transcripts for the evaluation may be obtained via LDC Order Number LDC98E9.

    Read the Evaluation Specification in its entirety before implementing the SDR Track evaluation.

    SOFTWARE

    The following PERL5 scripts may be used in manipulating the SDR textual data:

    ctm2srt.pl Filter to create a Speech Recognizer Transcript (.srt) format file from a SCLITE Speech Recognition Scoring (.ctm) format file.

    srt2ctm.pl Filter to create a SCLITE Speech Recognition Scoring (.ctm) formatfile from an Speech Recognizer Transcript (.srt) format file.

    srt2ltt.pl Filter to create a Lexical TREC Transcript (.ltt) format file from an Speech Recognizer Transcript (.srt) format file.

    DATA LICENSING

    Note that the Broadcast News recordings and transcriptions used as the spoken document collection in the 1998 TREC-7 SDR Track are licensed through the Linguistic Data Consortium (LDC) and are subject to usage restrictions. Contact the LDC for license agreement information. See the Evaluation Specification for more details.

    CONTACT INFORMATION

    Please direct all questions to speech_webmaster[at]nist.gov.

    [ Home ]

     

     

    Page Created: August 17, 2007
    Last Updated: November 4, 2008

    Multimodal Information Group is part of IAD and ITL
    NIST is an agency of the U.S. Department of Commerce
    Privacy Policy | Security Notices|
    Accessibility Statement | Disclaimer | FOIA