RT-04S Development Data Documentation
Data set updates
We have been notified that there was a problem with one of the meetings in the dev/test data set. One file was included twice under different names for the LDC_20011116-1400 meeting which resulted in LDC_20011116-1400_d06_NONE.sph and LDC_20011116-1400_d07_NONE.sph being identical. Moreover, one of the distant mic audio files for that same meeting was not included in the release either. We are therefore releasing these two files here. Let us know if you have a problem downloading these.
- LDC_20011116-1400_d07_NONE.sph (36.1MB)
- LDC_20011116-1400_d08_NONE.sph (35.1MB)
See LDC 20011116-1400 meeting mapping for updated mapping information.
Transcriptions
Transcriptions files for the RT-04S development data are now available for download:
RT-04S Dev/Test set transcripts.
Files in this archive are associated by means of a meeting ID to a particular meeting. We include transcripts for RT-04S two main tasks: Speech-to-Text Transcription (STT) and Diarization (SPKR) under several microphone conditions: Multiple Distant Microphones (MDM), Single Distant Microphone (SDM) and Individual Head Microphone (IHM). Note that individual head microphones were not available from all sites for RT-02 (which constitutes the dev/test set for RT-04S): the IHM condition is therefore replaced by the Individual Personal Microphone (IPM) condition for this dev/test set.
Global Map File
Prior to scoring, both the reference and system output token strings will be transformed using a global map file (GLM). The GLM is intended to ensure that reference and hypothesis tokens which do not differ semantically are scored as correct. This is accomplished by transforming the token strings in both the reference and system output via a set of mapping rules. The GLM applies a set of rules to the system output which expands contractions to all possible expanded forms.
We have updated the GLM for RT-04S. Please use the new one for both development test and evaluation scoring. Note, however, that it may be augmented with new cases occuring in the test data when NIST performs its official scoring of the evaluation.Back to RT-04S main page.
Meeting IDs mappings
Below is a table giving the mappings between the original site meeting names, the name the meeting was given for the RT-02 evaluation and the new name used in the RT-04S dev/test data set. Since some of the meetings in the development test set were included in the training data releases, please make sure not to use these for training if you intend to use them for development/test purposes.
| Corpus | Original site meeting ID | RT-02 meeting ID | RT-04S dev/test meeting ID |
|---|---|---|---|
| CMU | m096 | c096 | CMU_20020319-1400 |
| m097 | c097 | CMU_20020320-1500 | |
| ICSI | Bmr013 | b013 | ICSI_20010208-1430 |
| Bmr018 | b018 | ICSI_20010322-1450 | |
| LDC | 20011116_1400_GT6 | l004 | LDC_20011116-1400 |
| 20011116_1500_GM7 | l003 | LDC_20011116-1500 | |
| NIST | 20020214 | n004 | NIST_20020214-1148 |
| 20020305 | n003 | NIST_20020305-1007 |
Most centrally located distant microphones
| Meeting ID | Central distant mic |
|---|---|
| CMU_20020319-1400 | CMU_20020319-1400_d01_NONE.sph |
| CMU_20020320-1500 | CMU_20020320-1500_d01_NONE.sph |
| ICSI_20010208-1430 | ICSI_20010208-1430_d05_NONE.sph |
| ICSI_20010322-1450 | ICSI_20010322-1450_d05_NONE.sph |
| LDC_20011116-1400 | LDC_20011116-1400_d06_NONE.sph |
| LDC_20011116-1500 | LDC_20011116-1500_d07_NONE.sph |
| NIST_20020214-1148 | NIST_20020214-1148_d03_NONE.sph |
| NIST_20020305-1007 | NIST_20020305-1007_d03_NONE.sph |
Mapping between original sites, RT-02 and RT-04S filenames
The Summed.sph files were created at NIST from the mix of each head or lapel microphones.
CMU 20020319-1400 meeting mapping
| Original filename | RT-02 filename | RT-04S dev/test filename |
|---|---|---|
| m096denise01.sph | c096h0.sph | CMU_20020319-1400_l01_denise.sph |
| m096chek02.sph | c096h1.sph | CMU_20020319-1400_l02_chek.sph |
| m096tbc03.sph | c096h2.sph | CMU_20020319-1400_l03_tbc.sph |
| m096juliet04.sph | c096h3.sph | CMU_20020319-1400_l04_juliet.sph |
| m096mty05.sph | c096h4.sph | CMU_20020319-1400_l05_mty.sph |
| m096crown06.sph | c096t1.sph | CMU_20020319-1400_d01_NONE.sph |
| m096wlsadh07.sph | c096h5.sph | CMU_20020319-1400_l06_wlsadh.sph |
| Summed.sph | c096hs.sph | CMU_20020319-1400_lm_NONE.sph |
CMU 20020320-1500 meeting mapping
| Original filename | RT-02 filename | RT-04S dev/test filename |
|---|---|---|
| m097denise01.sph | c097h0.sph | CMU_20020320-1500_l01_denise.sph |
| m097chek02.sph | c097h1.sph | CMU_20020320-1500_l02_chek.sph |
| m097wlsadh03.sph | c097h2.sph | CMU_20020320-1500_l03_wlsadh.sph |
| m097juliet04.sph | c097h3.sph | CMU_20020320-1500_l04_juliet.sph |
| m097crown06.sph | c097t1.sph | CMU_20020320-1500_d01_NONE.sph |
| Summed.sph | c097hs.sph | CMU_20020320-1500_lm_NONE.sph |
ICSI 20010208-1430 meeting mapping
| Original filename | RT-02 filename | RT-04S dev/test filename |
|---|---|---|
| chan1.sph | b013h0.sph | ICSI_20010208-1430_h01_mn014.sph |
| chan2.sph | b013h1.sph | ICSI_20010208-1430_h02_fe008.sph |
| chan3.sph | b013h2.sph | ICSI_20010208-1430_h03_me013.sph |
| chan4.sph | b013h3.sph | ICSI_20010208-1430_h04_me018.sph |
| chan5.sph | b013h4.sph | ICSI_20010208-1430_h05_me001.sph |
| chan6.sph | Not Used | ICSI_20010208-1430_d01_NONE.sph |
| chan7.sph | Not Used | ICSI_20010208-1430_d02_NONE.sph |
| chan8.sph | b013h5.sph | ICSI_20010208-1430_h06_me011.sph |
| chanB.sph | b013h6.sph | ICSI_20010208-1430_h07_fe016.sph |
| chanC.sph | Not Used | ICSI_20010208-1430_d03_NONE.sph |
| chanD.sph | Not Used | ICSI_20010208-1430_d04_NONE.sph |
| chanE.sph | Not Used | ICSI_20010208-1430_d05_NONE.sph |
| chanF.sph | b013t1.sph | ICSI_20010208-1430_d06_NONE.sph |
| Summed.sph | b013hs.sph | ICSI_20010208-1430_hm_NONE.sph |
The most central distant microphone is the channel E or d05.Back to RT-04S main page.
ICSI 20010322-1450 meeting mapping
| Original filename | RT-02 filename | RT-04S dev/test filename |
|---|---|---|
| chan0.sph | b018h0.sph | ICSI_20010322-1450_h01_fe008.sph |
| chan1.sph | b018h1.sph | ICSI_20010322-1450_h02_me018.sph |
| chan2.sph | b018h2.sph | ICSI_20010322-1450_h03_me013.sph |
| chan3.sph | b018h3.sph | ICSI_20010322-1450_h04_fe016.sph |
| chan4.sph | b018h4.sph | ICSI_20010322-1450_h05_me011.sph |
| chan5.sph | b018h5.sph | ICSI_20010322-1450_h06_mn017.sph |
| chan6.sph | Not used | ICSI_20010322-1450_d01_NONE.sph |
| chan7.sph | Not used | ICSI_20010322-1450_d02_NONE.sph |
| chan8.sph | b018h6.sph | ICSI_20010322-1450_h07_me001.sph |
| chanC.sph | Not used | ICSI_20010322-1450_d03_NONE.sph |
| chanD.sph | Not used | ICSI_20010322-1450_d04_NONE.sph |
| chanE.sph | Not used | ICSI_20010322-1450_d05_NONE.sph |
| chanF.sph | b018t1.sph | ICSI_20010322-1450_d06_NONE.sph |
| Summed.sph | b018hs.sph | ICSI_20010322-1450_hm_NONE.sph |
The most central distant microphone is the channel E or d05.Back to RT-04S main page.
LDC 20011116-1400 meeting mapping
| Original filename | RT-02 filename | RT-04S dev/test filename |
|---|---|---|
| 20011116_1400_GT6_P1_left.wav | l004h0.sph | LDC_20011116-1400_l01_S200.sph |
| 20011116_1400_GT6_P1_right.wav | l004h1.sph | LDC_20011116-1400_l02_S201.sph |
| 20011116_1400_GT6_P2_left.wav | l004h2.sph | LDC_20011116-1400_l03_S202.sph |
| 20011116_1400_GT6_P3_left.wav | Not Used | LDC_20011116-1400_d01_NONE.sph |
| 20011116_1400_GT6_P3_right.wav | l004t1.sph | LDC_20011116-1400_d02_NONE.sph |
| 20011116_1400_GT6_P4_left.wav | Not Used | LDC_20011116-1400_d03_NONE.sph |
| 20011116_1400_GT6_P4_right.wav | Not Used | LDC_20011116-1400_d04_NONE.sph |
| 20011116_1400_GT6_P5_left.wav | Not Used | LDC_20011116-1400_d05_NONE.sph |
| 20011116_1400_GT6_P5_right.wav | Not Used | LDC_20011116-1400_d06_NONE.sph |
| 20011116_1400_GT6_P6_left.wav | Not Used | LDC_20011116-1400_d07_NONE.sph |
| 20011116_1400_GT6_P6_right.wav | Not Used | LDC_20011116-1400_d08_NONE.sph |
| Summed.sph | l004hs.sph | LDC_20011116-1400_hm_NONE.sph |
LDC 20011116-1500 meeting mapping
| Original filename | RT-02 filename | RT-04S dev/test filename |
|---|---|---|
| 20011116_1500_GM7_P1_left.wav | l003h0.sph | LDC_20011116-1500_l01_S300.sph |
| 20011116_1500_GM7_P1_right.wav | l003h1.sph | LDC_20011116-1500_l02_S301.sph |
| 20011116_1500_GM7_P2_left.wav | l003t1.sph | LDC_20011116-1500_d01_NONE.sph |
| 20011116_1500_GM7_P2_right.wav | Not Used | LDC_20011116-1500_d02_NONE.sph |
| 20011116_1500_GM7_P3_left.wav | Not Used | LDC_20011116-1500_d03_NONE.sph |
| 20011116_1500_GM7_P3_right.wav | Not Used | LDC_20011116-1500_d04_NONE.sph |
| 20011116_1500_GM7_P4_left.wav | Not Used | LDC_20011116-1500_d05_NONE.sph |
| 20011116_1500_GM7_P4_right.wav | Not Used | LDC_20011116-1500_d06_NONE.sph |
| 20011116_1500_GM7_P5_left.wav | Not Used | LDC_20011116-1500_d07_NONE.sph |
| 20011116_1500_GM7_P5_right.wav | Not Used | LDC_20011116-1500_d08_NONE.sph |
| 20011116_1500_GM7_P6_left.wav | l003h2.sph | LDC_20011116-1500_l03_S302.sph |
| Summed.sph | l003hs.sph | LDC_20011116-1500_hm_NONE.sph |
NIST 20020214-1148 meeting mapping
| Original filename | RT-02 filename | RT-04S dev/test filename |
|---|---|---|
| 20020214-HM_01-Subj_022-16K16b.sph | n004h0.sph | NIST_20020214-1148_h01_022.sph |
| 20020214-HM_02-Subj_002-16K16b.sph | n004h1.sph | NIST_20020214-1148_h02_002.sph |
| 20020214-HM_05-Subj_023-16K16b.sph | n004h2.sph | NIST_20020214-1148_h05_023.sph |
| 20020214-HM_06-Subj_024-16K16b.sph | n004h3.sph | NIST_20020214-1148_h06_024.sph |
| 20020214-HM_07-Subj_025-16K16b.sph | n004h4.sph | NIST_20020214-1148_h07_025.sph |
| 20020214-HM_08-Subj_005-16K16b.sph | n004h5.sph | NIST_20020214-1148_h08_005.sph |
| 20020214-LM_01-Subj_022-16K16b.sph | Not Used | NIST_20020214-1148_l01_022.sph |
| 20020214-LM_02-Subj_002-16K16b.sph | Not Used | NIST_20020214-1148_l02_002.sph |
| 20020214-LM_05-Subj_023-16K16b.sph | Not Used | NIST_20020214-1148_l05_023.sph |
| 20020214-LM_06-Subj_024-16K16b.sph | Not Used | NIST_20020214-1148_l06_024.sph |
| 20020214-LM_07-Subj_025-16K16b.sph | Not Used | NIST_20020214-1148_l07_025.sph |
| 20020214-LM_08-Subj_005-16K16b.sph | Not Used | NIST_20020214-1148_l08_005.sph |
| 20020214-OMNI_01-16K16b.sph | Not Used | NIST_20020214-1148_d01_NONE.sph |
| 20020214-OMNI_02-16K16b.sph | Not Used | NIST_20020214-1148_d02_NONE.sph |
| 20020214-OMNI_03-16K16b.sph | n004t1.sph | NIST_20020214-1148_d03_NONE.sph |
| 20020214-QUAD01_1-16K16b.sph | Not Used | NIST_20020214-1148_d04_NONE.sph |
| 20020214-QUAD01_2-16K16b.sph | Not Used | NIST_20020214-1148_d05_NONE.sph |
| 20020214-QUAD01_3-16K16b.sph | Not Used | NIST_20020214-1148_d06_NONE.sph |
| 20020214-QUAD01_4-16K16b.sph | Not Used | NIST_20020214-1148_d07_NONE.sph |
| 20020214-HM_mix-16K16b.sph | n004hs.sph | NIST_20020214-1148_hm_NONE.sph |
| 20020214-OMNI_mix-16K16b.sph | Not Used | Not Used |
NIST 20020305-1007 meeting mapping
| Original filename | RT-02 filename | RT-04S dev/test filename |
|---|---|---|
| 20020305-HM_01-Subj_002-16K16b.sph | n003h6.sph | NIST_20020305-1007_h01_002.sph |
| 20020305-HM_02-Subj_029-16K16b.sph | n003h0.sph | NIST_20020305-1007_h02_029.sph |
| 20020305-HM_04-Subj_028-16K16b.sph | n003h1.sph | NIST_20020305-1007_h04_028.sph |
| 20020305-HM_05-Subj_019-16K16b.sph | n003h2.sph | NIST_20020305-1007_h05_019.sph |
| 20020305-HM_06-Subj_005-16K16b.sph | n003h3.sph | NIST_20020305-1007_h06_005.sph |
| 20020305-HM_07-Subj_003-16K16b.sph | n003h4.sph | NIST_20020305-1007_h07_003.sph |
| 20020305-HM_08-Subj_006-16K16b.sph | n003h5.sph | NIST_20020305-1007_h08_006.sph |
| 20020305-LM_01-Subj_002-16K16b.sph | Not Used | NIST_20020305-1007_l01_002.sph |
| 20020305-LM_02-Subj_029-16K16b.sph | Not Used | NIST_20020305-1007_l02_029.sph |
| 20020305-LM_04-Subj_028-16K16b.sph | Not Used | NIST_20020305-1007_l04_028.sph |
| 20020305-LM_05-Subj_019-16K16b.sph | Not Used | NIST_20020305-1007_l05_019.sph |
| 20020305-LM_06-Subj_005-16K16b.sph | Not Used | NIST_20020305-1007_l06_005.sph |
| 20020305-LM_07-Subj_003-16K16b.sph | Not Used | NIST_20020305-1007_l07_003.sph |
| 20020305-LM_08-Subj_006-16K16b.sph | Not Used | NIST_20020305-1007_l08_006.sph |
| 20020305-OMNI_01-16K16b.sph | Not Used | NIST_20020305-1007_d01_NONE.sph |
| 20020305-OMNI_02-16K16b.sph | Not Used | NIST_20020305-1007_d02_NONE.sph |
| N/A | n003t1.sph | N/A |
| 20020305-QUAD01_1-16K16b.sph | Not Used | NIST_20020305-1007_d04_NONE.sph |
| 20020305-QUAD01_2-16K16b.sph | Not Used | NIST_20020305-1007_d05_NONE.sph |
| 20020305-QUAD01_3-16K16b.sph | Not Used | NIST_20020305-1007_d06_NONE.sph |
| 20020305-QUAD01_4-16K16b.sph | Not Used | NIST_20020305-1007_d07_NONE.sph |
| 20020305-HM_mix-16K16b.sph | n003hs.sph | NIST_20020305-1007_hm_NONE.sph |
| 20020305-OMNI_mix-16K16b.sph | Not Used | Not Used |