<%@LANGUAGE="JAVASCRIPT" CODEPAGE="65001"%> NIST Speech Group Website
Information Technology Lab, Information Access Division NIST: National Institute of Standards and Technology


  • Multimodal Information Group Home
  • Benchmark Tests
  • Tools
  • Test Beds
  • Publications
  • Links
  • Contacts
  • 2004 Topic Detection and Tracking (TDT-2004) Evaluation

    TDT 2004 is the seventh in a series of open evaluations that have investigated several aspects of developing algorithms for the automatic organization of news stories by the real-world events that they describe. Researchers who are interested in the following topics are encouraged to find out more about TDT 2004: high accruracy retrieval of documents, text filtering, cross-language issues, machine translation, speech recognition, text segmentation, compensating for degraded quality text, novelty detection, or other similar topics.

    The exciting news for the 2004 evaluation is that the LDC will be preparing a fresh evaluation corpus (See below for corpus information).   Also, we are experimenting with a new task: Heirarchical Topic Detection.  

    TDT evaluation tasks for 2004 are currently expected to be:

    Hierarchical Topic Detection (HTD) is a new research task.  The task is an extension of the Topic Detection task.  HTD structures the stories into directed acyclic graph (DAG) rather that non-overlapping clusters of stories.  The advantage being stories can belong to multiple clusters and clusters can be composed of sub-clusters.  The exact details are being worked out.  Join the TDT mailing list to help mold the task!

    The LDC is currently preparing the TDT5 corpus for the fall evaluation. The corpus will be structured identically to the other released TDT corpora and will include Arabic, English and Mandarin texts. However, TDT5 will not include broadcast news transcripts. The TDT4 corpus and the TDT4 topic annotations will be made available for system development. New TDT participants must complete a "dry run" evaluation using the TDT4 corpus prior to recieving the TDT4 topic annotations. A "dry run" evaluation is completed by building an initial system and running a blind evaluation using last year's evaluation index files. 2003 participants will recieve the TDT4 topic annotations without completing a dry run because they have already tested their systems on the TDT4 corpus.

    Last year, the folks at CIIR from UMass prepared a TDT 2003 Evaluation Information Primer. The site is a quick start guide to the evaluation, is applicable to the 2004 evaluation, and is a good place to start before diving in to the Evaluation Plan.

    EVALUATION PROJECT STATUS

    The 2004 TDT Evaluation workshop was held December 2-3, 2004 at NIST.  The presentations and papers have been posted to document the systems and the system's performance on the TDT5 corpus.

    Data for the evaluation are provided by the Linguistic Data Consortium (LDC). It can be either purchased as a member of the LDC, or provided free for the evaluation with some restrictions. See the TDT resources web page for additional information.

    Joining the TDT mailing list will keep you abreast of recent information. See the contact information below for instructions.

    EVALUATION SOFTWARE AND RESOURCES

    The following resources are available for participants to build systems:
    • The TDT evaluation suite has been updated in light of the evaluation plan. The latest version of TDT3eval V2.6 has the following new features:
      • Hierarchical Topic Detection index files are generated,
      • Command line switches were added to restrict the sources to either newswire or broadcast news sources, and
      • Link detection index files include topics which do not have on topic stories in all three languages.
      • TDT3Trk and DetectionScore calculate TREC's supervised adaptive tracking utility measure.
    • Example Index files built for the TDT4 corpus are available from NIST. Since they are LDC-licensed, they can not be posted on the website. Please contact NIST for the download URL.
    • A new scoring tool has been written for Hierarchical Topic Detection called HTDEval Version 1.4. It is based on code from UMass to evaluate their HTD systems. This version has a modified travel cost normalization scheme and bug fixes.

    DRY RUN EVALUATIONS FOR NEW PARTICIPANTS

    All new participants must complete a dry run evaluation in order to participate in the Fall evaluation and to be eligible to receive the TDT4 topic annotations. The dry run web page gives specific instructions for completing a dry run evaluation.

    CONTACT INFORMATION

    If you are interested in participating in TDT, would like to be added to the TDT email list, or have questions about the evaluation protocols and software, contact speech_webmaster[at]nist.gov.

    Questions regarding the TDT corpora and obtaining access to it should be directed to ldc@ldc.upenn.edu


    The TDT2004 Evaluation is Sponsored by

    DARPA logo

    Defense Advanced Research Projects Agency (DARPA)
    Information Processing Technology Office

    [ Home ]

     

     

    Page Created: August 21, 2007
    Last Updated: November 4, 2008

    Multimodal Information Group is part of IAD and ITL
    NIST is an agency of the U.S. Department of Commerce
    Privacy Policy | Security Notices|
    Accessibility Statement | Disclaimer | FOIA