<%@LANGUAGE="JAVASCRIPT" CODEPAGE="65001"%> NIST Speech Group Website
Information Technology Lab, Information Access Division NIST: National Institute of Standards and Technology


  • Multimodal Information Group Home
  • Benchmark Tests
  • Tools
  • Test Beds
  • Publications
  • Links
  • Contacts
  • TDT3 Evaluation FAQ

    Questions
    1. Where do I get the language information for a source file?
    2. What does the content language mean?
    3. Can I submit system outputs for both content languages (English and native)?
    4. What is a primary system
    5. What are the required evaluation conditions
    6. For the tracking evaluation: what is allowable side information concerning the topic training data?
    Answers
    1. ANSWER TO: Where do I get the language information for a source file? There is an "Auxiliary Information" file, (which is defined by the evaluation specification and produced by the TDT3BuildIndex.pl script), that contains three pieces of information for each source file, the source language, the broadcast date/time and broadcast source. The doc/example_indexes directory contains contains the file doc/example_indexes/aux_info.ndx as an example.

    2. ANSWER TO: What does the content language mean? The content language is the language in which the text is rendered. For instance, Mandarin can be rendered is BG encoded characters, native content language, or it can be translated into English, English content language.

      The example illustrated the two possible content language conditions, 'nat' for native text, or 'eng' for English translations.

    3. ANSWER TO: Can I submit system outputs for both content languages (English and native)? YES, of course. The two evaluation conditions represent contrasts between multilingual TDT using SYSTRAN's Mandarin to English translations versus site-developed techniques to do multilingual detection.

      There are restrictions, however, on what is considered your primary submission (See below).

    4. ANSWER TO: What is a primary system? If a site submits more than one run for a single task and a single set of conditions (as defined in the evaluation plan), then that site must identify one run from those set of runs as a "primary" run. This will presumably represent the site's "best" system and will be used for cross-site comparisons. The selection must be made prior to the run, of course. Note that content language is not a defined evaluation condition, therefore sites must choose to use either Native or English content language.

    5. ANSWER TO: What are the required conditions? The required conditions are defined for each task in the evaluation specification. You can find the most recent version on the NIST TDT3 Webpage.

    6. ANSWER TO: For the tracking evaluation: what is allowable side information concerning the topic training data? The only "known" information about the topic training data is which stories are positive examples of the topic. No information is known about the "off-topic" stories, nor can any information be inferred from the position of the on-topic stories. In fact, there may be unused on-topic stories in the training data.

     

     

    Page Created: August 22, 2007
    Last Updated: November 4, 2008

    Multimodal Information Group is part of IAD and ITL
    NIST is an agency of the U.S. Department of Commerce
    Privacy Policy | Security Notices|
    Accessibility Statement | Disclaimer | FOIA