|
Multimodal Information Group Home
Benchmark Tests
Tools
Test Beds
Publications
Links
Contacts
|
MT08
Human Assessment Results
Human assessments for OpenMT08 were implemented using a participant-volunteer model and were limited to one system submission per participant, which had to be their primary system entered in either the Constrained or Unconstrained training condition. Human assessments were offered for the Arabic-to-English, Chinese-to-English, and Urdu-to-English current tests.
Two types of human assessment were done in OpenMT08:
- Adequacy asessemtents:
An assessor was presented with one reference translation and one system translation at a time. The assessor decided on a 7-point scale how adequate the MT output was by judging how much of the pertinent information was preserved. For segments that received one of the higher scores, the assessor then proceeded to also provide a more global yes/no judgment as to whether the system translation meant essentially the same as the reference translation.
- Preference asessments:
An assessor viewed one reference translation and two system translations at a time and selected the MT output deemed to be the better translation given the reference translation.
Documents included in the human assessments were selected to cover a range of attained BLEU scores. Each segment (or segment pair in the case of Preference judgments) was assessed by two independent judges.
The human assessment results of OpenMT08 were not made available to the public, unlike originally planned. Below are two examples of the kinds of scores attained:
|