RT07 12/5 telecon 11 AM EST LDC: Stephanie Strassel, Meghan Glenn IBM: Makis, Etienne, Jing Huang ICSI: Adam Janin, Wooters UKA: Mathias, Cedric, John McDonough LIMSI: Lori Lamel AMI: Hain -Review open action items -NIST starts a discussion on whether or not coffee breaks are in-domain and therefore eligible to be in the testset. -CHIL to release coffee break examples --> Cedric did make these available in dev data, and they're clearly marked. They are not available as a /separate/ dataset, but they are clearly separated from the rest of the dev data in the release. Cedric points out that not many sites asked for this data. Adam @ ICSI did request it, but wasn't able to process it -- or even read the disk on any OS. **Cedric will give access to download to interested parties. **Everyone will review coffee break examples and comment next time. -All review examples **postponed until next telecon -NIST publishes the Forced Aligned reference files from RT-05 and RT-06 No comment. -CHIL will report on the soon-to-be-released DEV data and it will be documented on a website. Done -Adam J. to release "quick-and-dirty" scripts for processing AMI transcripts ** will circulate -Adam J. to circulate suggested AMI Dev/Train divide ** Will just pick one and circulate -Speaker diarization of lecture data: JF and SB report. This year's lecture data is much more interactive than last year's data: enough to meeth the 10% threshhold so the lecture data will be used for the SPKR task --> I GOT CUT OFF FOR THE REST OF THIS -Pending email discussion of close talking mic condition: "Can manual segmentation be a primary condition for the IHM tests?" Yes. Adam says manual segmentation is okay. ICSI will still do segmentation, but want the manual reference to see how much is lost in automatic segmentation. Lots of cross-talk. LL raises concern about this point. Sites agree that cross-talk and overlapping speech are good -- this is one motivation for making manual segmentation a primary condition. JF suggests conditionally scoring everything and then sub-scoring on segments with or without cross-talk. - Pending email discussion to keep MDM condition the primary condition All agreed: should be primary. But JF will start email discussion to confirm with others. -New Issues: -SASTT task definition In eval plan. Not many people have had a chance to review that document yet. **Let's discuss this at the next meeting, and over email. -Evaluation Schedule discussion - no complaints so it will stand -Evaluation data -CMU delivered data today No comment. -Training data: CHIL: Description: 40 segments of 5 min from all over seminar scenario -- some coffee breaks, some beginning of seminar, some end of seminar, some "normal" data as have had in the past. This data looks exactly like the test data, so if sites do not like the coffee breaks, they should speak up now, and then there will only be ~35 segments in the test set. NIST: will be releasing training data before end of the year: full training set is 18 hours, will distribute as much as is complete by then. (at least 10 hours) If sites want this data, send NIST an email. Adam points out that the AMI meetings are not on the list of training meetings. **ACTION: JF will add these to the list. - NIST has secured a limited amount of funding for the release of LDC corpora to RT participants. The list of corpora are available in the eval plan (and also in Stephanie's email of 12/5). Stephanie explains the data-access process: all who are interested in the LDC data must send an email to NIST, requesting it. NIST will send LDC the list; LDC will circulate user agreements to sites. This is a no-cost membership for the duration of the evaluation, and sites can request any corpora on the lists, which includes FISHER, TDT4, and RT corpora from previous years. JF underscores that there are LIMITED funds for this purpose, so sites are asked to please only request what they NEED. Stephanie concludes that LDC will be able to accomodate any reasonable requests. **ACTION: Sites send NIST email if they want this data. -Schedule next telecon 12/21, 11 AM EST. Call adjourned 11:49 AM