The RT-02 evaluation is the first in a series of new NIST administered evaluations, which combines both Speech-To-Text (STT) and metadata data (MD) annotation. RT-02 will make use of existing scoring tools to perform the evaluation. As such, the two tasks defined: Speech To Text and Speaker Segmentation and Identification, will still be evaluated independently.
After RT-02, NIST will release new evaluation software that will have the flexibility to evaluate a wide variety of STT and metadata types that are combined into a single representation.
2.0 Speech-To-Text Scoring Instructions
Traditionally, NIST has referred to this as Automatic Speech Recognition (ASR), but we are adopting the term STT for the RT evaluation series. STT tasks will be evaluated similarly to past ASR tasks. The NIST scoring utilities, Tranfilt and SCTK, will be used to normalize the transcripts prior to scoring and the transcripts will be scored with the SCTK utility SCLITE.
Past scoring conventions regarding overlapping speech, hesitations and other speech phenomena will be followed this year. Consult the RT-02 evaluation plan for specific details.
2.1 Required Scoring Utilities
Two software packages, two supporting scripts, the UTF DTD, and a global mapping file needed to evaluate the output of an STT system. The resources are available via ftp from the following locations:
2.2 System Output Formatting
The RT-02 STT system output format uses the same CTM format as used in previous Hub-4 and Hub-5 evaluations.
The ctm file format is a concatenation of time mark records for each word in each channel of a waveform. The records are separated with a newline. Each word token must have a waveform id, channel identifier (matching the channels of the reference file), start time, duration, and word text. Optionally a confidence score can be appended for each word.
Consult the CTM documentation on the NIST website for complete details.
2.3 Example Invocation
There is a scoring example compressed tar archive available at the for the NIST ftp server. The package contains a readme file showing how the utilities are used to score an STT system.
3.0 Metadata Annotation Scoring Instructions for Speaker Segmentation
The speaker segmentation and identification metadata annotation task will be evaluated using the same procedures as defined in the N-Speaker Segmentation condition for the NIST 2001 Speaker Recognition Evaluation. From the 2001 Speaker Recognition Evaluation Plan Section 2.1.4, the task is to identify the time intervals during which unknown speakers are each speaking in a recording. The speakers are unknown implying there is no assumed pool of speakers to identify and therefore this task does not require cross-recording speaker mapping.
The evaluation will use the same system evaluation Segmentation Error Rate as defined in Section 3.3 of the 2001 Speaker Recognition Evaluation Plan Section.
3.1 Required Scoring Utilities
The speaker segmentation scoring utilities are available from the URL ftp://jaguar.ncsl.nist.gov/pub/seg_scr.v21.tar.Z. The package contains example input files an instructions for how to run the program.
3.2 System Output Formatting
The segmentation scoring software takes as input an index file of segmentation files to score. The index file is an ASCII file containing a list of records. Each newline-separated record identifies corresponding hypothesis and reference segmentation files to score. The index file is formatted as follows:
Both the system-generated segmentation files and reference segmentation files use the same format. The files are formatted as lists of segmentation records. Each record indicates the start and end times for a speaker, and the speaker id to which the interval is attributed. The file is formatted as follows:
3.3 Example Invocation
The segmentation software distribution readme includes an example system and reference segmentation files along with examples of how to run the scoring program.
Page Created: September 17, 2007
Multimodal Information Group
is part of
NIST is an agency of the U.S. Department of Commerce
Accessibility Statement | Disclaimer | FOIA