%@LANGUAGE="JAVASCRIPT" CODEPAGE="65001"%>
|
|
|
|
1999 Topic Detection and Tracking (TDT-3) Evaluation![]() This page contains information and links to files for the 1999 TDT Phase 3 (TDT3) technology evaluation project. 1999 TDT WorkshopNIST hosted the 1999 TDT Workshop at the Tysons Corner Sheraton Premiere February 28-March 1, 2000. The content-rich 2.5 day workshop consisted of many technical presentations given by the participants and analysis of the benchmark test results. In addition to the presentations, each participant was required to write a technical paper describing their past years work. Both the presentations and papers are available on this web site.BACKGROUNDThe 1999 Topic Detection and Tracking (TDT3) project was a continuation and extension of the 1998 Topic Detection and Tracking (TDT-2) project, of which the results were presented and discussed at the 1999 DARPA Broadcast News Workshop held in Herndon Virginia. NIST prepared this presentation and this evaluation overview paper for the workshop.The purpose of this years' project was to advance the state of the art in technologies required to segment, detect, and track topical information in a stream consisting of news stories. The research was guide by key technical challenges through an evaluation-driven R&D paradigm, in which key technical challenges are defined and supported by formal evaluations. In TDT, the information flowing from each source, either text or speech, is modeled as a sequence of stories (and non-stories) that may provide information on one or more topics. The project addresses detection and tracking of topical information in both English and Mandarin Chinese. There are five key technical challenges will be explored in TDT3:
INSTRUCTIONS AND DOCUMENTATIONThe TDT3 Evaluation Specification Version 2.7 (in PostScript) (in Microsoft Word) was the core document for the TDT3 Evaluation and contains detailed information regarding participation and implementation. The TDT3 Evaluation FAQ answered common questions for the evaluation.TDT3 Dry Run(s) and EvaluationsNIST conducted two "dry run" evaluations during the summer of 1999 and the final evaluations in the fall of 1999.In the past, dry runs have been an effective tool for debugging new evaluations, both for the specification of evaluation tasks, the development of new evaluation procedures, and site implementation of evaluation procedures. Perspective new participants were encouraged to participate in the dry runs and to attend the TDT workshops. (Site participation was a prerequisite for workshop attendance.) The first dry run evaluation focused on the new aspects of the TDT3 evaluation, cross language (English and Mandarin) TDT, the new tasks of First Story Detection and Link Detection. The corpora used for the dry runs was the Full 6-month TDT2 Corpus with the 20 Cross-Language Topics. SCHEDULEThe following was the schedule of events that led to the Evaluation Workshop.
CORPORA AND LANGUAGE RESOURCESCorpora and Mandarin language resources for system development and evaluation were provided by the Linguistic Data Consortium (LDC). The LDC had two TDT corpora available for system development, the TDT Pilot study corpus (TDT-Pilot) and the TDT Phase 2 (TDT-2) corpus. A third TDT corpus, TDT Phase 3, was released as evaluation data in the Fall of 1999. The LDC has also organized and prepared Mandarin language resources.Contact the LDC at ldc@ldc.upenn.edu to obtain these materials, or if you already have the TDT2 corpus, you can verify possession of the latest TDT2 updates through the LDC's TDT2 Current Release webpage. TDT Corpora DescriptionsComplete documentation on the TDT corpora supplied by LDC may be accessed from URL http://www.ldc.upenn.edu/TDT.Mandarin Language ResourcesThe LDC has prepared a very useful web page containing pointers to LDC and WWW Mandarin language resources. The page will updated as new resources are available.SOFTWAREThe NIST 1999 TDT3 Scoring Software was used to score the Detection, Tracking, Segmentation, First Story Detection and Link Detection runs. (Note that it requires the prior installation of PERL5.) After decompressing and tar-extracting the archive, see the file, "readme.txt" for installation and usage details.DATA LICENSINGThe TDT-1 and TDT-2 corpora are licensed through the Linguistic Data Consortium (LDC) and are subject to usage restrictions. Contact the LDC for license agreement information.CONTACT INFORMATIONIf you are interested in participating in TDT, would like to be added to the TDT email list, or have questions about the evaluation protocols and software, contact speech_webmaster[at]nist.gov.Questions regarding the TDT corpora and obtaining access to it should be directed to ldc@ldc.upenn.edu
Page Created: August 21, 2007 |
|||||||||||||||||||||||||||||||
|
Multimodal Information Group
is part of
IAD
and
ITL NIST is an agency of the U.S. Department of Commerce |
Privacy Policy |
Security Notices| Accessibility Statement | Disclaimer | FOIA |