|
Multimodal Information Group Home
Benchmark Tests
Tools
Test Beds
Publications
Links
Contacts
|
TDT3 Evaluation FAQ
Questions
- Where do I get the language information
for a source file?
- What does the content language
mean?
- Can I submit system outputs
for both content languages (English and native)?
- What is a primary system
- What are the required
evaluation conditions
- For the tracking evaluation:
what is allowable side information concerning the topic training data?
Answers
- ANSWER TO: Where do I get the language information for a
source file?
There is an "Auxiliary Information" file, (which is defined
by the evaluation specification and produced by the TDT3BuildIndex.pl
script), that contains three pieces of information for each source
file, the source language, the broadcast date/time and broadcast
source. The doc/example_indexes directory contains contains the
file doc/example_indexes/aux_info.ndx as an example.
- ANSWER TO: What does the content language mean?
The content language is the language in which the text is
rendered. For instance, Mandarin can be rendered is BG encoded
characters, native content language, or it can be translated into
English, English content language.
The example illustrated the two possible content language
conditions, 'nat' for native text, or 'eng' for English translations.
- ANSWER TO: Can I submit system outputs for both content languages
(English and native)?
YES, of course. The two evaluation conditions represent contrasts
between multilingual TDT using SYSTRAN's Mandarin to English translations
versus site-developed techniques to do multilingual detection.
There are restrictions, however, on what is considered your
primary submission
(See below).
- ANSWER TO: What is a primary system?
If a site submits more than one run for a single task and
a single set of conditions (as defined in the evaluation plan),
then that site must identify one run from those set of runs as
a "primary" run. This will presumably represent the site's "best"
system and will be used for cross-site comparisons. The selection
must be made prior to the run, of course. Note that content language
is not a defined evaluation condition, therefore sites must choose
to use either Native or English content language.
- ANSWER TO: What are the required conditions?
The required conditions are defined for each task in the evaluation
specification. You can find the most recent version on the NIST TDT3 Webpage.
- ANSWER TO: For the tracking evaluation: what is allowable
side information concerning the topic training data?
The only "known" information about the topic training data
is which stories are positive examples of the topic. No information
is known about the "off-topic" stories, nor can any information
be inferred from the position of the on-topic stories. In fact,
there may be unused on-topic stories in the training data.
Page Created: August 22, 2007
Last Updated:
November 4, 2008
|