Multimodal Information Group Home
NIST Machine Translation Evaluation for GALE
Phase 3 / Phase 3.5
The GALE Translation evaluation will test machine translation of text and recorded speech data. The test will include language data from both Arabic and Chinese, with system performance tallied separately for each language and separately for text and recorded speech sources.
GALE contractors will be the only participants in this evaluation, and the participants must meet specific Go/No-Go levels of performance. This page provides information regarding the 2008 GALE Phase 3 / Phase 3.5 Translation evaluations.
- Evaluation plan (updated 06/18/2008)
- Data selection guidelines v2.2 (updated 01/02/2007)
- Post-editing guidelines v3.0.2 (updated 05/25/2007)
- Sequestered data
- About one third (~5k reference words) of the P2 documents in both languages are to be sequestered for Phase 3. The sequestered files are listed below:
- Although the list identifies the snippets that were used in P2.5, all snippets within the given document IDs are also being sequestered.
This means all snippets in ABUDHABI_ABUDHNEWS_ARB_20061216_115800 (not just snippet S1) are to be sequestered.
For P3/P3.5 evaluation, we will the following software:
- BBN/UMD-created Java scoring software v0.7.2. This code can be obtained here or directly from the author's website at http://www.cs.umd.edu/~snover/tercom (link updated 06/06/2008)
* Matthew Snover, Bonnie Dorr, Richard Schwartz, Linnea Micciulla, and John Makhoul, "A Study of Translation Edit Rate with Targeted Human Annotation," Proceedings of Association for Machine Translation in the Americas, 2006.
- Post-editing software v1.1.2. NIST is developing the post editing software package using Java. Our working environment is NETBEANS IDE 5.0 on MAC OS X. Choose to download the MTPostEditor NETBEANS Java Project or the MTPostEditor JAR file (link updated 06/09/2008)
|Nov-01-06 to Dec-22-06
||P2/P2.5 evaluation epoch
|Jun-01-07 to Jun-30-07
||P3/P3.5 evaluation epoch
Site collected data originated within P2/P2.5/P3/P3.5 epochs are off-limit to training.
LDC collected data originated within the P3/P3.5 epoch or overlap in content with the P2.5 sequestered data are off-limit to training.
||GALE Arabic translation evaluation starts
||Translations of text and audio due at NIST
||Final scores to DARPA
||GALE Out-of-Sync Arabic and Chinese translation evaluation begins
||Translation of text due at NIST
||Translation of audio due at NIST
||Final scores to DARPA
||GALE PI meeting
[ GALE Home ]
Page Created: December 13, 2006
December 16, 2009