Metadata Annotation Experiment
An initial experiment was run in January, 2002 to explore metadata annotation as relevant to the Rich Transcription Evaluation and DARPA EARS program. The results of this experiment were used by NIST in defining the metadata annotation portion of the RT-02 evaluation and will be used in planning for future RT evaluations. Since this is still an open area of research, we continue to welcome additional submissions.
The experimenter may propose one or more metadata annotation types by describing the annotation type(s) and hand-annotating the proposed types on the specified experimental data (see below). Proposed annotation types should satisfy the following criteria:
2.0 Experiment Results
Researchers were encouraged to work with the sample source data (audio files and associated transcriptions) to create a set of metadata annotation definitions and sample annotations which they believed were of interest for the RT metadata task. Several research sites submitted a variety of suggested types. The raw results of the experiment are given in this metadata annotation experiment results summary page. Using these suggestions, NIST created a putative set of initial metadata annotation types (speaker change/ID, acronym, verbal edit interval, named entity/type, numeric expression/type, and temporal expression/type) which we believed could be implemented this year. (Note that we had also conducted an earlier internal experiment which demonstrated that annotation of sentences or punctuation in spontaneous interactive speech was a very difficult task and required further study and we chose to defer exploration of those types.) However, given the very tight schedule, we decided to focus only on the detection of speaker changes and clustering within excerpt for this first evaluation. This would permit us to develop and implement an infrastructure for metadata annotation evaluation while continuing to study and discuss other metadata types of interest for implementation in future evaluations.
3.0 Experimental Data
The data for this experiment consists of three short excerpts of digitally recorded speech from each of three source types (news broadcasts, telephone conversations, and meetings) for a total of nine excerpts of about 20 minutes total duration. Since orthographic transcription is not the focus of this exercise, the orthography has been provided. Please annotate around the given orthography (even if you disagree with it). Since speaker segmentation is an obvious metadata type of interest and is relatively non-controversial to annotate, we are pre-annotating the experimental data with this information in the prescribed format to provide a concrete example of type submissions we expect. The audio data and transcripts are available at metadata annotation expriment data samples page.
4.0 Submission Formats
You must submit both a data type definition for the proposed metadata type and fully-annotate all of the provided experimental data with the proposed metadata type.
4.1 Data Type Definition
Complete the following template for each proposed metadata type:
4.2 Experimental Data Format
Note that although time will likely be the primary unit for metadata annotation for RT-2002, for this experiment please provide inline tagging of your proposed metadata type in the orthography we have provided. Ignoring time is likely to save a considerable amount of effort for both NIST and you.
When doing your annotation, start with the speaker tagged data provided in the examples directory and add your tags inline with the text. If you propose multiple annotation types, you may either submit one version of each experimental data file per annotation type, or put multiple annotation types in each experimental data file.
The marked up files need not adhere to the SGML requirement for tag nesting and may contain overlapping tags. Please be clear in naming your files and document your organization clearly in the email you send us indicating a list of the files you are submitting and what they contain.
Example: (Note that we have included times in this example. Your annotations need not include times, just proper tag positioning relative to the orthography)
Multimodal Information Group
is part of
NIST is an agency of the U.S. Department of Commerce
Accessibility Statement | Disclaimer | FOIA