<%@LANGUAGE="JAVASCRIPT" CODEPAGE="65001"%> RT Spring 2005 Evaluation
Information Technology Lab, Information Access Division NIST: National Institute of Standards and Technology


  • Multimodal Information Group Home
  • Benchmark Tests
  • Tools
  • Test Beds
  • Publications
  • Links
  • Contacts
  • Rich Transcription Spring 2005 Evaluation

    Evaluation Overview

    The Rich Transcription 2005 Spring Meeting Recognition evaluation will be the third NIST sponsored evaluation of speech technologies within the meeting domain.  This year's evaluation will build on the success of the RT-04 Spring Evaluation by continuing the Speech-To-Text (STT) and Speaker Diarization (SPKR) evaluation tasks and adding two new evaluation tasks: Speech Activity Detection (SAD) and Source Localization (SLOC).  Briefly, systems that are designed for each task must perform the following functions:

    • Speech-to-Text Transcription – convert spoken words into text,
    • Speaker Diarization – find the segments of time within a meeting in which each meeting participant is talking,
    • Speech Activity Detection – detect when someone in a meeting space is talking,
    • Source Localization – determine the three dimensional position of a person who is talking in a meeting space.
    In 2004 the evaluation was conducted solely on a small meeting conference room sub domain.  Building on last year's success, a lecture room sub domain with seminar speech will be added to the evaluation.   The two sub domains differ in the amount of participant interactivity and sensor configurations sufficiently to warrant separate treatment for system developers as well as different evaluation tasks.  The following table is a quick summary of the evaluation tasks supported in each sub domain.

    Evaluation Task Meeting Sub Domains
    Conference
    Room
    (2 hr. test set)
    Lecture
    Room
    (2 hr. test set)
    Speech-To-Text Evaluated Evaluated
    Speaker Diarization Evaluated Evaluated
    Speech Activity Detection Evaluated Evaluated
    Source Localization N/A Evaluated

    Multimodal sensor signals will be available to systems this year.  Along with the multiple distant microphones and individual head microphones used previously, digital video and digital microphone array data (lecture room data only) will be supplied as well.

    The evaluation is being organized by NIST in collaboration with the Augmented Multi-party Interaction (AMI) and  Computers In the Human Interaction Loop (CHIL) programs.  The results of this evaluation will be presented within a special one day session at the Workshop on Multimodal Interaction and Related Machine Learning Algorithms (MLMI) July 13, 2005 in Edinburgh, UK.

    Evaluation Plans and Documentation

    The evaluation plan is now available from the links below. Also available is the Speaker Localization and Tracking Criteria which defines the speaker localization task. Together, the documents define the four tasks, system inputs and outputs, and the scoring tools and metrics. Before the evaluation plan was written, a synopsis was written to document the essential components of the evaluation. The synopsis is superseded by the evaluation plan.

    1. Rich Transcription 2005 Spring Meeting Recognition evaluation plan V1
    2. CHIL Speaker Localization and Tracking Evaluation Criteria

    Participation Agreement and Licensing

    Participants of the RT-05S evaluation must complete and return to NIST the RT-05S Evaluation Participation Agreement as well as the AMI, CHIL and LDC data licensing agreements. There are two documents that must be downloaded, printed, signed and faxed back to NIST. The documents are:

    The evaluation participation agreement contains the fax number and address for mailing the forms if a fax is unavailable. No training data or evaluation data will be distributed until NIST receives the above forms.

    Speaker Localization Evaluation Tools

    ITC-IRST has developed the scoring tool for the SLOC task.  The source code is a separate C  program that scores system output according to the CHIL Speaker Localization and Tracking Evaluation Criteria document. 

    Example data has also been provided for understanding the system output format and metrics.

    Tentative Evaluation Data

    The evaluation corpus will come from meetings recorded at ICSI, CMU/ISL, and NIST and by the AMI and CHIL programs. The evaluation data table gives an overview of the meeting sensor setups and other relevant information.

    MLMI 2005 Meeting Recognition Workshop

    Evaluation participants will have an automatic slot at the MLMI Meeting Recognition Workshop in Edinburgh, UK on July 13, 2005 and will be expected to contribute a paper and presentation for the workshop.

    The meeting will run from 9:00am, (when registration begins), until 5:30pm.  The agenda for the meeting is jambed with talks and ample time for discussions.  We hope the meeting will be productive and informative.

    Schedule (as of Jan. 21, 2005)

    Milestone
    Date
    Signed Commitment to participate faxed to NIST
    28-Apr-2005
    Sites receive evaluation data. Evaluation begins
    12-May-2005
    Sites submit system outputs to NIST
    26-May-2005 5:00 pm EDT
    NIST reports results for non-overlapping STT, SPKR, SAD and SLOC
    2-Jun-2005
    NIST reports results for overlapping STT
    9-Jun-2005
    Evaluation system description papers and presentations due
    27-Jun-2005

     

    [ RT Home ]

     

     

    Page Created: Month Day, Year
    Last Updated: November 4, 2008

    Multimodal Information Group is part of IAD and ITL
    NIST is an agency of the U.S. Department of Commerce
    Privacy Policy | Security Notices|
    Accessibility Statement | Disclaimer | FOIA