  • Topic Detection and Tracking Evaluation

    Topic Detection and Tracking research was pursued under the DARPA Translingual Information Detection, Extraction, and Summarization (TIDES) program:

    Topic Detection and Tracking is an integral part of the DARPA Translingual Information Detection, Extraction, and Summarization (TIDES) program. The goal of the TIDES program is to enable English-speaking users to access, correlate, and interpret multilingual sources of real-time information and to share the essence of this information with collaborators.

    As a TIDES evaluation community, TDT provides a forum to discuss applications and techniques for detecting and tracking events that occur in real-time and the infrastructure to support common evaluations of component technologies. The TIDES project currently has one other evaluation community, The Text REtrieval Conference (TREC), and planning has begun for three new evaluations in the areas of Text Summarization, Question Answering and Quick Machine Translation.

    The NIST TDT project has ended and will not be restarted in the near future. The data remains available from the Linguistic Data Consortium and the evaluation resources on this site will remain available.

    TDT research develops algorithms for discovering and threading together topically related material in streams of data such as newswire and broadcast news in both English and Mandarin Chinese. The overview paper "Multilingual Topic Detection and Tracking: Successful Research Enabled by Corpora and Evaluation," (Wayne LREC2000) describes in more detail the TDT program, the TDT corpora (collections of broadcast news recordings and transcripts), and the TDT technology evaluation paradigm.

    TDT research started with a pilot study in 1997 and has continued with open evaluations in TDT 1998, TDT 1999, TDT 2000, TDT 2001, TDT 2002, TDT 2003, and TDT 2004. The TDT 1999 Workshop, TDT2000 Workshop, TDT 2003 Workshop, and TDT 2004 Workshop web pages contains detailed information about the most recent evaluations plus copies of virtually all the presentations and papers from the workshops.

    The TDT research applications keep track of topics, (events of interest), in a constantly expanding collection of multimedia stories.

    TDT applications either organize vast amounts of data or facilitate large scale collections of non-text media. There are 5 research applications defined in the TDT Program.

    1. Story Segmentation - Detect changes between topically cohesive sections
    2. Topic Tracking - Keep track of stories similar to a set of example stories
    3. Topic Detection - Build clusters of stories that discuss the same topic
    4. First Story Detection - Detect if a story is the first story of a new, unknown topic
    5. Link Detection - Detect whether or not two stories are topically linked

    Shared resources, such as TDT corpora, language resources and evaluation software, provide the necessary tools to build a TDT application. Arguably, the most valuable resource made available to the community is the TDT corpora. The TDT corpora consist of broadcast news and newswire texts sampled daily during most of 1998. The LDC exhaustively annotated the corpora by identifying which stories discuss a predefined set of topics.

