Minutes from 8/30 telecon: Attendees: NIST: Jon F., John G. ICSI: Adam J., Chuck W. LDC: Stephanie S, Meghan G. KU: John M., Christian F., sorry I missed the rest MIT: Doug R. Sheffield: Thomas H. STT Discussion: --------------- Jon F. - The proposed task is similar to RT-05S except the primary metric will be for regions of speech with up to 5 overlapping speakers, (the traditional non-overlapping speech STT WER will be calculated), the MDM condition will be renamed to Central Distant Microphones, *** Note: the proposal listed the All Distant Microphone condition as the primary audio condition rather that the CDM condition but during the telecon the issue was not discussed. The discussion quickly moved to data, however there appears to be enough development resources and sufficient site participation to include the task in the eval. Both ICSI and the CHIL sites (LIMSI, IBM, and KU) have agreed to participate. (The CHIL site participation is pending the manager's decision, but is very likely). ICSI raised the issue of not adding new training data. A significant amount of their time was spent training on new data and they would like to avoid this. The issue was not resolved. Data Discussion: --------------- Important decisions need to be made concerning the data for the evaluation. In order to further the discussion, a conference call was scheduled for Sept. 13, 2005 at from 11:00-12:30 EDT. For that call, everyone who might donate meetings to the data pool must send an email to the mailing list giving details about the meetings they can donate. The following points were brought up or discussed. - What meeting sub domains will be included in the eval? John M. said the CHIL consortium is collecting data in their new rooms. They are collecting lectures and interactive lectures. The later will likely have a dev-test produced by ELDA. NIST will contribute conference room meetings. - How long will the test set be? A discussion about the size and relative proportions of sub domains was discussed. Jon F. noted there are many competing needs so the corpus might need to be larger than last year. - How long will the test excerpts be? There was agreement that excerpting was the only way to keep the test set size manageable while maximizing meeting variability. The size of the excerpts needs to be decided. - Who will transcribe/annotate the data? - What are the minimal sensor requirements? - Minimal sensor configurations? Stephanie S. asked what the minimal sensor requirements are. Jon F. said there is a definite minimum for head mic'd participants and several centrally located distant microphones. John G. said he would like to see all meetings with video, but that as long as there are enough meetings with video to build a test subset for alternative tasks, meetings without video would be of use. - What will be the future/concurrent test set uses? The John G. said the VASE community and John M. said the CHIL community would like to leverage the RT test set for other evaluations. Jon F. post meeting comment: By silent acceptance, we should probably try to accommodate these requests. Are there any other communities that would like to leverage the test set? If so, let the list know prior to the conference call. - Data deadlines John M. asked what the deadlines were for data resources. Jon F. replied: Nov 31 - Development test (or additional training) data released to sites. In order to the data to be released, the data must be in a format that is immediately usable by researchers, this includes: - data structured according to the RT spec (see the eval plan). - time synchronized sensors - transcripts for every person talking in the meeting - Reference STM and RTTM files for use in scoring - Dev test RTTMs must be derived from force alignments if the SPKR people accept the change to force alignments (see below) Dec 31 - Test data available for annotation. To be included in the test set, the data must be immediately usable by the transcribers/annotators; this includes: - data structured according to the RT spec - time synchronized sensors SPKR Discussions: ----------------- Jon F. gave an overview of the proposed task changes from last year. They are: (1) to use forced word alignments to derive the reference RTTMs, (2) to change the noscore collar to 0.0, and (3) to use DER of all speech" (including overlap) as the primary metric (the non-overlapping DER will still be calculated). Note: the SPKR proposal listed the All Distant Microphone condition as the primary audio condition rather that the CDM condition but during the telecon the issue was not discussed. Concerning forced alignment: Adam J. and Chuck W. asked to know what data would be forced alignments. Jon F. said NIST would force align the RT-05S test set, the RT-06S test set, and any new dev test data for the Nov 31 delivery. NIST will use the LIMSI force alignment program for BNews data. Concerning the change to the noscore collar: In previous emails, the ICSI folks asked what was the right collar. The mailing list agreed that 0.0 probably isn't right. Jon F. said a study should be done to answer the question. The study should measure the error rate of the forced alignments and set the collar appropriately. Is there sufficient resources for the task? It wasn't discussed, but the answer is yes. Is there sufficient participation interest in the task? The answer is yes. During the meeting, ICSI and MIT declared interest. After the meeting David A. van Leeuwen (TNO) said he was interested. Meeting Understanding Evaluation Proposal: ----------------------------------------- The proposal was not discussed. Action Items: ------------- 1. Sites with the potential to donate data to the evaluation pool should provide the community with a list of resources they could donate to the eval pool. 2. Jon F. will ping previous a potential STT and SPKR sites to see if they intend on participating in the evaluation. 3. Programs that want to use the test set for tasks not covered by the RT tasks must tell the community their needs.