<%@LANGUAGE="JAVASCRIPT" CODEPAGE="65001"%> NIST Speech Group Website
Information Technology Lab, Information Access Division NIST: National Institute of Standards and Technology


  • Multimodal Information Group Home
  • Benchmark Tests
  • Tools
  • Test Beds
  • Publications
  • Links
  • Contacts
  • 2000 TREC-9 Spoken Document Retrieval Track

    This page contains information and links to files for the 2000 TREC-9 Spoken Document Retrieval (SDR) Track. Note that it will be updated periodically as new materials and information become available. Members of the SDR email list will be notified of updates.

    BACKGROUND

    This website is dedicated to the 2000 TREC-9 Spoken Document Retrieval (SDR) Track which implements an evaluation of retrieval of broadcast news excerpts using a combination of automatic speech recognition and information retrieval technologies.

    INSTRUCTIONS AND DOCUMENTATION

    The 2000 TREC-9 SDR Evaluation Specification Version 1.0 is the core document for the SDR Track and contains detailed information regarding participation, implementation, and schedule. If you intend to participate in the SDR Track, read this document first!

    To encourage the exploration of the use of automatically-extracted non-lexical information from the audio signal, we have developed a specification for the common exchange of such information to support experiments in the SDR Track. As such, we are providing a Segmentation Detection Table (.sdt) format (HTML File) which specifies a standard for the exchange of this type of data. Sites contributing segmentation data to the SDR Track must format their data accordingly. Likewise, sites wishing to use contributed segementation data should tool their systems to input this format.

    For more information concerning past SDR tracks see the TREC proceedings and the summary, The TREC Spoken Document Retrieval Track : A Success Story (Garofolo et al, April 2000).

    Only one baseline recognizer is provided this year and will be designated, "B1". Note that this recognizer is the same as the 1999 "B2" recognizer. Details regarding this recognizer are given in the paper Automatic Language Model Adaptation for Spoken Document Retrieval, (Auzanne et al, April 2000).

    SCHEDULE

    The following is the schedule for the SDR Track :
     
    Site registration: ASAP
    SPH and NDX available
    (recognition task begins)
    02 Apr 2000
    Recognizer transcripts (SRTs) due for scoring/sharing 21 Jun 2000
    Non-lexical information files (SDT) due for sharing 21 Jun 2000
    NDXs, LTTs, SRTs, topics available
    (all retrieval tasks begin)
    30 Jun 2000
    All search results due at NIST 14 Aug 2000 9am EDT
    Relevance judgements released by NIST 02 Oct 2000
    Scored Retrieval Results released by NIST 02 Oct 2000
    Conference workbook papers to NIST 25 Oct 2000 (estimated)
    TREC-9 Conference 13-16 Nov 2000

    TRAINING RESOURCES

    No particular training collection is specified or provided for this track. However, below are some resources available from the LDC for recognition and retrieval training. Please see the Evaluation Specification for rules governing the use of these materials.

    Text Resources

    An LDC compilation of text resources is available for recognition and retrieval training.

    Speech Resources

    1998 Hub-4 training data may be used for SDR training.

    IR resources from Previous Tests

    Topics and assessments used in previous SDR tests can be used for training.

    TEST RESOURCES

    Speech recognition task

    The Evaluation Specification provides the rules and instructions for implementing the SDR track. The following resources are provided for sites implementing the speech recognition portion of the full SDR task:

    Information retrieval task

    The Evaluation Specification provides the rules and instructions for implementing the SDR track. The following resources are provided for sites implementing the retrieval portion of the SDR task:
    • NEW Test Material contains all necessary material for implementing retrieval testing except audio files

    TEST RESULTS

    The results of the TREC-9 2000 SDR evaluation presented at TREC on November 14, 2000 showed that retrieval performance for sites on their own recognizer transcripts was virtually the same as their performance on the human reference transcripts. Therefore, retrieval of excerpts from broadcast news using automatic speech recognition for transcription was deemed to be a solved problem - even with word error rates of 30%.

    DATA LICENSING

    Note that the Broadcast News recordings and transcriptions used as the spoken document collection in the 2000 TREC-9 SDR Track are licensed through the Linguistic Data Consortium (LDC) and are subject to usage restrictions. Contact the LDC for license agreement information. See the Data Licensing and Costs section in the Evaluation Specification for more details.

    CONTACT INFORMATION

    If you would like to sign up for the SDR track or any others TREC tracks, please register per the instructions on the TREC website.

    If you have questions regarding the SDR data and protocols, contact speech_webmaster[at]nist.gov.

    [ Home ]

     

     

    Page Created: August 17, 2007
    Last Updated: November 4, 2008

    Multimodal Information Group is part of IAD and ITL
    NIST is an agency of the U.S. Department of Commerce
    Privacy Policy | Security Notices|
    Accessibility Statement | Disclaimer | FOIA