%@LANGUAGE="JAVASCRIPT" CODEPAGE="65001"%>
|
|
|
|
2004 Topic Detection and Tracking (TDT-2004) EvaluationTDT 2004 is the seventh in a series of open evaluations that
have investigated several aspects of developing algorithms for
the automatic organization of news stories by the real-world
events that they describe. Researchers who are interested in
the following topics are encouraged to find out more about TDT
2004: high accruracy retrieval of documents, text filtering,
cross-language issues, machine translation, speech recognition,
text segmentation, compensating for degraded quality text, novelty
detection, or other similar topics. The exciting news for the 2004 evaluation is that the LDC will
be preparing a fresh evaluation corpus (See below for corpus
information). Also, we are
experimenting with a new task: Heirarchical Topic Detection. TDT evaluation tasks for 2004 are currently expected to be:
Hierarchical Topic Detection (HTD) is a new research task. The task is an extension of the Topic Detection task. HTD structures the stories into directed acyclic graph (DAG) rather that non-overlapping clusters of stories. The advantage being stories can belong to multiple clusters and clusters can be composed of sub-clusters. The exact details are being worked out. Join the TDT mailing list to help mold the task! The LDC is currently preparing the TDT5 corpus for the fall evaluation. The corpus will be structured identically to the other released TDT corpora and will include Arabic, English and Mandarin texts. However, TDT5 will not include broadcast news transcripts. The TDT4 corpus and the TDT4 topic annotations will be made available for system development. New TDT participants must complete a "dry run" evaluation using the TDT4 corpus prior to recieving the TDT4 topic annotations. A "dry run" evaluation is completed by building an initial system and running a blind evaluation using last year's evaluation index files. 2003 participants will recieve the TDT4 topic annotations without completing a dry run because they have already tested their systems on the TDT4 corpus. Last year, the folks at CIIR from UMass prepared a TDT 2003 Evaluation Information Primer. The site is a quick start guide to the evaluation, is applicable to the 2004 evaluation, and is a good place to start before diving in to the Evaluation Plan. EVALUATION PROJECT STATUSThe 2004 TDT Evaluation workshop was held December 2-3, 2004
at NIST. The presentations and papers
have been posted to document the systems and the system's performance
on the TDT5 corpus. Data for the evaluation are provided by the Linguistic Data Consortium (LDC). It can be either purchased as a member of the LDC, or provided free for the evaluation with some restrictions. See the TDT resources web page for additional information. Joining the TDT mailing list will keep you abreast of recent information. See the contact information below for instructions. EVALUATION SOFTWARE AND RESOURCESThe following resources are available for participants to build systems:
DRY RUN EVALUATIONS FOR NEW PARTICIPANTSAll new participants must complete a dry run evaluation in order to participate in the Fall evaluation and to be eligible to receive the TDT4 topic annotations. The dry run web page gives specific instructions for completing a dry run evaluation.CONTACT INFORMATIONIf you are interested in participating in TDT, would like to be added to the TDT email list, or have questions about the evaluation protocols and software, contact speech_webmaster[at]nist.gov.Questions regarding the TDT corpora and obtaining access to it should be directed to ldc@ldc.upenn.edu
Information Processing Technology Office
Page Created: August 21, 2007 |
|
Multimodal Information Group
is part of
IAD
and
ITL NIST is an agency of the U.S. Department of Commerce |
Privacy Policy |
Security Notices| Accessibility Statement | Disclaimer | FOIA |