Topic Detection and Tracking Evaluation
Topic Detection and Tracking research was pursued under the DARPA Translingual Information Detection, Extraction, and Summarization (TIDES) program:
Topic Detection and Tracking is an integral part of the DARPA Translingual Information Detection, Extraction, and Summarization (TIDES) program. The goal of the TIDES program is to enable English-speaking users to access, correlate, and interpret multilingual sources of real-time information and to share the essence of this information with collaborators.
As a TIDES evaluation community, TDT provides a forum to discuss applications and techniques for detecting and tracking events that occur in real-time and the infrastructure to support common evaluations of component technologies. The TIDES project currently has one other evaluation community, The Text REtrieval Conference (TREC), and planning has begun for three new evaluations in the areas of Text Summarization, Question Answering and Quick Machine Translation.
The NIST TDT
project has ended and will not be restarted in the near future.
The data remains available from the Linguistic Data Consortium and the
evaluation resources on this site will remain available.
TDT research develops algorithms for discovering and threading
together topically related material in streams of data such as newswire
and broadcast news in both English and Mandarin Chinese. The overview
paper "Multilingual Topic
Detection and Tracking: Successful Research Enabled by Corpora and
Evaluation," (Wayne LREC2000) describes in more detail the TDT
program, the TDT corpora (collections of broadcast news recordings and
transcripts), and the TDT technology evaluation paradigm.
TDT research started with a pilot study in 1997 and has continued with open evaluations in TDT 1998, TDT 1999, TDT 2000, TDT 2001, TDT 2002, TDT 2003, and TDT 2004. The TDT 1999 Workshop, TDT2000 Workshop, TDT 2003 Workshop, and TDT 2004 Workshop web pages contains detailed information about the most recent evaluations plus copies of virtually all the presentations and papers from the workshops.
The TDT research applications keep track of topics, (events of interest), in a constantly expanding collection of multimedia stories.
TDT applications either organize vast amounts of data or facilitate large scale collections of non-text media. There are 5 research applications defined in the TDT Program.
Shared resources, such as TDT corpora, language resources and evaluation software, provide the necessary tools to build a TDT application. Arguably, the most valuable resource made available to the community is the TDT corpora. The TDT corpora consist of broadcast news and newswire texts sampled daily during most of 1998. The LDC exhaustively annotated the corpora by identifying which stories discuss a predefined set of topics.
Contact speech_webmaster[at]nist.gov at NIST For information about joining the TDT evaluation community.
Page Created: August 21, 2007
Multimodal Information Group
is part of
NIST is an agency of the U.S. Department of Commerce
Accessibility Statement | Disclaimer | FOIA