Minutes from RT Teleconference Nov. 25, 2008
Role call
Jon Fiscus - NIST
Travis Rose - NIST
Meghan Glenn - LDC
Andreas Stolcke - SRI/ICSI
Bin Ma
Mike Lincoln
Friedrich Faubel (friedrich.faubel@lsv.uni-saarland.de)
Dietrich Klakow (dietrich.klakow@lsv.uni-saarland.de)
Asmaa El Hannani
Gerald Friedland
John McDonough
How to handle IHM condition for AMI meetings with remote
participants. (See minutes from last telecon for details)
- Since the remote participant has both a head mic and there is a
mic array on front of the speaker, there is functionally no difference
in setups except developers will need to know which mic array is the
remote one.
- JM: Question: How many elements are in the remote mic. Arrays:
- AJ: I think it’s 8.
- Action
item: Can someone (Mike Lincoln) verify this?
Workshop venue
- NIST would like to find an alternative location
- GF: What are the logistical requirements?
- JF: The requirements are: The meeting will be 2-2.5 days for
40-50 people, audio/video support, suitable dining options.
- Action
Item for All: Does anyone have any good ideas for a new venue?
Data Resources: Evaluation Data Contributions
- JF: So far we have data from one AMIDA site and NIST. Can anyone
else collect data to increases the speaker population? Are there any
CHIL rooms still operating?
- JM: They found that the data collection setup for Lectures was
not suitable for Conference3 meetings so we should avoid using those
rooms. Maybe M. Omologo has a room setup?
- AJ: The IDIAP room may still be operational. Maybe JF can contact
John Dines, Vinnie, Steve, or Jean Carlotta?
- JM: For the NIST Mark IIIs, was the problems solved with
correlated noise in the daughter boards?
- JF: Partially, but not completely. The full story is that when
we reproduced the CHIL mods to the design, most of the noise was
eliminated, but not all.
- JM: Then based on previous experience, we may have trouble
using the data.
- JF: Vince Stanford avoids the problem by using a different
source localization scheme. I will find a citation for how he does it.
- AJ: The correlated samples explains why high-pass filtering the
mic array data worked for them
- JM: When can we get a sample of data?
- JF: Mid-January for a large set. We will get a micro corpus out
for experiments as soon as possible. NIST and LDC will collaborate.
- JF: In the spirit of increasing the speaker population for
speaker diarization, would it be feasible to use excerpts from the old
test data that were not transcribed to build a bigger test set? What
are the implications for systems?
- After a brief discussion, SPKR systems typically don’t train
models for speakers but it didn’t seem like a good idea.
What can be done to share processed data?
- JM: Saarland can provide the segmenations as
well as the beamformed waveforms. We will derive the segmentations fromour speaker tracking system so that we can also assign each segment to a
speaker.
- Question: Would that be software or the processed signals?
- JM: (Per the above correction, waveforms)
- Action
Item JF: We’ll need to make provisional dates for this in the
schedule
How many STT sites are there going to be?
- ICSI/AMI
- ICSI/SRI
- Saarland?
When will RT07 video data be available?
- JF: We’re working on it. We’ll have a better eta next week.
Microphone Array Quality Assessment methods:
- Not covered. We ran out of time.
Next Call:
- Dec 4th: Is 16:00 GMT a good time?
New evaluation condition: Fixed latency proposal
- GF: The definition is good, but how will it be put into
practice? How will you allow for training/adaptation?
- JF: How you train and adapt is the essence of the challenge
- AJ: What's the right duration?
- After some discussion the general consensus was that it depends
on the tasks and the application: for STT, most systems use breath
group segments which is in the range of 10-20s, but SPKR systems need
to be quicker 3s can be too long.
- ??(please tell me who brought this up): In their lab, they've
experimented and found that fixed thresholds are only one way to
specify the delay time. Specifying an average delay or an
exponential delay gives a system the ability to delay delivery on some
segments but then "catch up" on later segments.
- AS: Maybe instead we should add to the output format the
look ahead time used for a output and then make a metric the thresholds
the output based on latency?
- After a discussion, we decided to specify look ahead times that
are easily attained for this year and then experiment with alternative
strategies with the goal to develop new metrics and procedures for
future evals.
- Action
Item for JF: Rewrite th proposal with generous fixed thresholds
and with definitions for look ahead time entries in the output files.
Adjourn