OpenHaRT 2010 Evaluation Results

Release Date: October 15th, 2010 - 14:37 EDT

The NIST Open Handwriting Recognition and Translation Evaluation (OpenHaRT) is an evaluation of image-to-text transcription and translation technologies and is open to all who find the tasks of interest. The 2010 evaluation was the first evaluation of this series and was conducted in accordance with the protocol described in the 2010 OpenHaRT evaluation plan.

Disclaimer

These results are not to be construed, or represented as endorsements of any participant's system or commercial product, or as official findings on the part of NIST or the U.S. Government. Note that the results submitted by developers were generally from research systems, not commercially available products. Since OpenHaRT was an evaluation of research algorithms, the test design required local implementation by each participant. As such, participants were only required to submit their system output to NIST for uniform scoring and analysis. The systems themselves were not independently evaluated by NIST.

The data, protocols, and metrics employed in this evaluation were chosen to support research and should not be construed as indicating how well these systems would perform in applications. While changes in the data domain, or changes in the amount of data used to build a system, can greatly influence system performance, changing the task protocols could indicate different performance strengths and weaknesses for these same systems.

The 2010 OpenHaRT evaluation was the first in what we envision to be a long series of document understanding technology evaluations. Developing a strong evaluation series will require us to learn from our evaluation methods. In 2010, some of the evaluation protocols were suggested to be too restrictive for first time participation - requiring several participants to forgo submissions in many conditions. As we address these concerns, we expect the evaluation series to become more informative to both NIST and the participants. The 2010 OpenHaRT evaluation was a great learning experience for all involved and we look forward to building on our findings.

Because of the above reasons, this should not be interpreted as a product testing exercise and the results should not be used to make conclusions regarding which commercial products are best for a particular application.

Results Release History

Evaluation Tasks

The three evaluation tasks measure different aspect within to overall system:

Segmentation Conditions

The two segmentation conditions explore relationship between the system's performance and the system's ability to segment the data:

Performance Measurements

System performances on translation tasks (DIT, DTT) are measured using the following metrics:

System performances on transcription task (DIR) are measured using WER. WER is an error percentage metric. The results are presented as 100-WER. Punctuations are filtered but are scored as is.

Participants

The table below lists the organizations and the task for which they registered to participate in the evaluation. The submissions have the following descriptors:

Site ID Organization Location DIT with word segmentation DIT with line segmentation DIR with word segmentation DIR with line segmentation DTT
A2iA A2iA France - - official - -
APPTEK Applications Technology, Inc. USA withdrawn withdrawn - - official
IfNREGIM Institute for Communications Technology Braunschweig Technical University Germany - - pilot withdrawn -
UPV-PRHLT Pattern Recognition and Human Language Technology Group Universitat Politecnia de Valencia Spain withdrawn withdrawn official official withdrawn
tubitak The Scientific and Technological Research Council of Turkey Turkey pilot withdrawn pilot withdrawn pilot
uob University of Balamand Lebanon pilot withdrawn pilot withdrawn withdrawn

Evaluation Results

The tables and graphs below give the overall results for each task and segmentation condition over the entire data set. The official results section contains the results for on-time submissions while the pilot results section contains the results for late submissions. Cross-site comparisons are limited to the primary systems. The contrastive systems are only compared against the primary system from the same site for the same task and segmentation condition.


Official Results


Results for Document Text Translation Task

ID 100-TER METEOR BLEU
APPTEK.primary.1 43.7502 0.6079 0.2485
tubitak.primary.1 36.2983 0.5760 0.2372

100-TER METEOR BLEU

Results for APPTEK Document Text Translation Task

ID 100-TER METEOR BLEU
APPTEK.primary.1 43.7502 0.6079 0.2485
APPTEK.c1.1 43.7392 0.6091 0.2444

100-TER METEOR BLEU

Results for tubitak Document Text Translation Task

ID 100-TER METEOR BLEU
tubitak.c2.1 42.4548 0.5723 0.2528
tubitak.c1.1 42.4294 0.5733 0.2543
tubitak.primary.1 36.2983 0.5760 0.2372

100-TER METEOR BLEU


Results for Document Image Recognition Task with Word Segmentation

ID 100-WER
A2iA.primary.1 62.3053
UPV-PRHLT.primary.1 48.5132

100-WER

Results for A2iA Document Image Recognition Task with Word Segmentation

ID 100-WER
A2iA.primary.1 62.3053
A2iA.c4.1 62.2244
A2iA.c2.1 61.3029
A2iA.c1.1 54.0070
A2iA.c3.1 53.8251
A2iA.c0.1 44.9447

100-WER

Results for UPV-PRHLT Document Image Recognition Task with Word Segmentation

ID 100-WER
UPV-PRHLT.c1.1 51.0620
UPV-PRHLT.primary.1 48.5132

100-WER


Results for Document Image Recognition Task with Line Segmentation

ID 100-WER
UPV-PRHLT.primary.1 52.5418

100-WER

Results for UPV-PRHLT Document Image Recognition Task with Line Segmentation

ID 100-WER
UPV-PRHLT.c1.1 52.5418
UPV-PRHLT.primary.1 52.5418

100-WER


Pilot Results


Results for Document Image Translation Task with Word Segmentation

ID 100-TER METEOR BLEU
tubitak.primary.1 15.8386 0.2629 0.0498
uob.primary.1 7.4699 0.1637 0.0181

100-TER METEOR BLEU

Results for tubitak Document Image Translation Task with Word Segmentation

ID 100-TER METEOR BLEU
tubitak.primary.1 15.8386 0.2629 0.0498

100-TER METEOR BLEU


Results for Document Image Recognition Task with Word Segmentation

ID 100-WER
tubitak.primary.1 29.0510
uob.primary.1 28.6038
IfNREGIM.primary.2 0.4448

100-WER

Results for IfNREGIM Document Image Recognition Task with Word Segmentation

ID 100-WER
IfNREGIM.primary.2 0.4448

100-WER

Results for tubitak Document Image Recognition Task with Word Segmentation

ID 100-WER
tubitak.primary.1 29.0510

100-WER