Go Back

Correlation Results

Current Conditions

  • Human Assessment Type: Preferences, Pair-wise comparison across systems
  • Target Language: English
  • Correlation Level: system

Subdivisions

By track:

Ranking

Single Reference Track
RankMetric NameSpearman's RhoKendall's TauPearson's RGraphs
Value95% confidence intervalValue95% confidence intervalValue95% confidence interval
1SEPIA20.6812(0.5511, 0.7790)0.4967(0.3218, 0.6389)0.6292(0.4844, 0.7404)graph_scatterplot
2CDer-0.6797(-0.7779, -0.5492)-0.4983(-0.6401, -0.3236)-0.6270(-0.7388, -0.4816)graph_scatterplot
3ULCh0.5959(0.4425, 0.7154)0.4186(0.2304, 0.5766)0.6062(0.4554, 0.7231)graph_scatterplot
4TER-v0.7.25-0.6485(-0.7548, -0.5090)-0.4674(-0.6158, -0.2871)-0.5904(-0.7112, -0.4357)graph_scatterplot
5DP-Orp0.5353(0.3681, 0.6690)0.3813(0.1880, 0.5462)0.5667(0.4064, 0.6931)graph_scatterplot
6NIST-v11b0.6866(0.5581, 0.7829)0.5008(0.3266, 0.6421)0.6269(0.4815, 0.7387)graph_scatterplot
7ATEC40.6035(0.4520, 0.7211)0.4324(0.2463, 0.5877)0.5557(0.3929, 0.6847)graph_scatterplot
8ATEC10.6065(0.4558, 0.7234)0.4419(0.2573, 0.5954)0.5619(0.4006, 0.6895)graph_scatterplot
9mBLEU0.4672(0.2869, 0.6156)0.3424(0.1444, 0.5140)0.2646(0.0596, 0.4481)graph_scatterplot
10SNR0.6157(0.4673, 0.7303)0.4329(0.2469, 0.5881)0.5853(0.4293, 0.7073)graph_scatterplot
114-GRR0.6701(0.5367, 0.7708)0.4896(0.3133, 0.6333)0.6195(0.4722, 0.7332)graph_scatterplot
12ATEC20.6102(0.4604, 0.7262)0.4426(0.2581, 0.5959)0.5662(0.4057, 0.6927)graph_scatterplot
13SEPIA10.7015(0.5777, 0.7938)0.5146(0.3431, 0.6529)0.6387(0.4964, 0.7475)graph_scatterplot
14ULCopt0.6105(0.4609, 0.7264)0.4288(0.2422, 0.5848)0.5908(0.4362, 0.7115)graph_scatterplot
15mTER-0.4300(-0.5858, -0.2436)-0.3210(-0.4961, -0.1209)-0.1440(-0.3420, 0.0663)graph_scatterplot
16EDPM0.6816(0.5516, 0.7792)0.4927(0.3169, 0.6357)0.6284(0.4834, 0.7398)graph_scatterplot
17BLEU-40.6863(0.5578, 0.7827)0.5013(0.3272, 0.6425)0.6270(0.4816, 0.7388)graph_scatterplot
18METEOR-v0.60.6914(0.5644, 0.7864)0.5105(0.3382, 0.6497)0.6446(0.5040, 0.7519)graph_scatterplot
19RTE-MT0.6316(0.4875, 0.7423)0.4477(0.2641, 0.6000)0.5766(0.4186, 0.7007)graph_scatterplot
20BadgerLite0.5513(0.3875, 0.6813)0.3900(0.1978, 0.5533)0.5727(0.4138, 0.6977)graph_scatterplot
21METEOR-ranking0.6982(0.5733, 0.7914)0.5167(0.3456, 0.6545)0.6551(0.5174, 0.7597)graph_scatterplot
22LET0.6577(0.5207, 0.7616)0.4758(0.2970, 0.6224)0.6032(0.4517, 0.7209)graph_scatterplot
23DP-Or0.6349(0.4917, 0.7447)0.4432(0.2589, 0.5964)0.6680(0.5340, 0.7692)graph_scatterplot
24ATEC30.6424(0.5012, 0.7503)0.4667(0.2863, 0.6152)0.5893(0.4344, 0.7104)graph_scatterplot
25BLEU-v120.6935(0.5671, 0.7879)0.5006(0.3264, 0.6420)0.6366(0.4938, 0.7460)graph_scatterplot
26BEwT-E0.5680(0.4081, 0.6942)0.4068(0.2170, 0.5671)0.4512(0.2682, 0.6028)graph_scatterplot
27RTE0.5898(0.4350, 0.7108)0.4303(0.2439, 0.5861)0.5482(0.3837, 0.6789)graph_scatterplot
28DR-Or0.5518(0.3882, 0.6817)0.3865(0.1939, 0.5505)0.5627(0.4015, 0.6901)graph_scatterplot
29BleuSP0.7105(0.5895, 0.8004)0.5208(0.3505, 0.6577)0.6370(0.4943, 0.7463)graph_scatterplot
30SVM-Rank0.6947(0.5687, 0.7888)0.4988(0.3242, 0.6405)0.6023(0.4506, 0.7202)graph_scatterplot
31BLEU-10.6869(0.5585, 0.7831)0.4955(0.3203, 0.6380)0.6329(0.4891, 0.7432)graph_scatterplot
32Bleu-sbp0.6835(0.5541, 0.7806)0.5019(0.3279, 0.6429)0.6267(0.4812, 0.7386)graph_scatterplot
33invWer-0.6480(-0.7544, -0.5083)-0.4661(-0.6147, -0.2856)-0.5789(-0.7025, -0.4215)graph_scatterplot
34BLEU-v11b0.6811(0.5510, 0.7789)0.4958(0.3207, 0.6382)0.6246(0.4785, 0.7370)graph_scatterplot
35SR-Or0.5598(0.3980, 0.6879)0.3868(0.1942, 0.5507)0.5719(0.4128, 0.6971)graph_scatterplot
36Badger0.5795(0.4222, 0.7029)0.4206(0.2328, 0.5782)0.6076(0.4571, 0.7242)graph_scatterplot
37Meteor-v0.70.6933(0.5669, 0.7878)0.5126(0.3407, 0.6513)0.6405(0.4988, 0.7489)graph_scatterplot
38MaxSim0.6379(0.4954, 0.7469)0.4543(0.2718, 0.6053)0.6436(0.5027, 0.7512)graph_scatterplot
39TERp-0.7481(-0.8274, -0.6394)-0.5631(-0.6904, -0.4020)-0.6501(-0.7560, -0.5111)graph_scatterplot

39 metrics (including 7 baseline metrics)
89 data points (total number of systems used)

Multiple References Track
RankMetric NameSpearman's RhoKendall's TauPearson's RGraphs
Value95% confidence intervalValue95% confidence intervalValue95% confidence interval
1SEPIA20.7375(0.5871, 0.8387)0.5556(0.3404, 0.7154)0.6805(0.5067, 0.8012)graph_scatterplot
2CDer-0.7258(-0.8310, -0.5703)-0.5461(-0.7087, -0.3284)-0.6800(-0.8008, -0.5060)graph_scatterplot
3ULCh0.2288(-0.0389, 0.4658)0.1785(-0.0912, 0.4237)0.0932(-0.1765, 0.3498)graph_scatterplot
4TER-v0.7.25-0.6758(-0.7980, -0.5002)-0.4976(-0.6740, -0.2677)-0.6376(-0.7723, -0.4481)graph_scatterplot
5DP-Orp0.2070(-0.0617, 0.4477)0.1549(-0.1151, 0.4036)0.0249(-0.2420, 0.2883)graph_scatterplot
6NIST-v11b0.7127(0.5517, 0.8225)0.5147(0.2888, 0.6863)0.6789(0.5044, 0.8001)graph_scatterplot
7ATEC40.6895(0.5192, 0.8072)0.5273(0.3046, 0.6953)0.6544(0.4708, 0.7836)graph_scatterplot
8ATEC10.6869(0.5156, 0.8055)0.5205(0.2961, 0.6905)0.6554(0.4722, 0.7843)graph_scatterplot
9SNR0.1953(-0.0738, 0.4379)0.1421(-0.1280, 0.3926)0.0505(-0.2177, 0.3117)graph_scatterplot
10mBLEU0.6030(0.4020, 0.7485)0.4432(0.2017, 0.6340)0.5263(0.3034, 0.6946)graph_scatterplot
114-GRR0.6786(0.5040, 0.7999)0.4976(0.2677, 0.6740)0.6266(0.4333, 0.7648)graph_scatterplot
12ATEC20.6900(0.5198, 0.8075)0.5232(0.2995, 0.6924)0.6550(0.4717, 0.7841)graph_scatterplot
13SEPIA10.7718(0.6369, 0.8608)0.5865(0.3804, 0.7371)0.7161(0.5565, 0.8247)graph_scatterplot
14ULCopt0.2154(-0.0530, 0.4547)0.1596(-0.1104, 0.4076)0.0638(-0.2050, 0.3236)graph_scatterplot
15EDPM0.7522(0.6083, 0.8483)0.5609(0.3473, 0.7192)0.6871(0.5159, 0.8056)graph_scatterplot
16mTER-0.5875(-0.7378, -0.3817)-0.4343(-0.6274, -0.1911)-0.4103(-0.6093, -0.1628)graph_scatterplot
17BLEU-40.7437(0.5960, 0.8427)0.5564(0.3415, 0.7160)0.6860(0.5143, 0.8048)graph_scatterplot
18METEOR-v0.60.7214(0.5640, 0.8282)0.5448(0.3267, 0.7078)0.6760(0.5005, 0.7982)graph_scatterplot
19BadgerLite0.3877(0.1364, 0.5921)0.2539(-0.0123, 0.4864)0.3632(0.1083, 0.5733)graph_scatterplot
20METEOR-ranking0.7182(0.5595, 0.8261)0.5461(0.3284, 0.7087)0.6830(0.5101, 0.8028)graph_scatterplot
21LET0.7201(0.5622, 0.8274)0.5434(0.3250, 0.7068)0.6840(0.5115, 0.8035)graph_scatterplot
22DP-Or0.2370(-0.0302, 0.4726)0.1808(-0.0888, 0.4257)0.1703(-0.0995, 0.4167)graph_scatterplot
23ATEC30.6732(0.4966, 0.7963)0.5111(0.2844, 0.6837)0.6603(0.4789, 0.7876)graph_scatterplot
24BLEU-v120.7571(0.6154, 0.8514)0.5623(0.3491, 0.7202)0.6846(0.5123, 0.8039)graph_scatterplot
25BEwT-E0.6797(0.5056, 0.8006)0.5017(0.2727, 0.6769)0.6183(0.4223, 0.7591)graph_scatterplot
26DR-Or0.2183(-0.0499, 0.4571)0.1563(-0.1137, 0.4048)0.0976(-0.1722, 0.3537)graph_scatterplot
27BleuSP0.7605(0.6203, 0.8536)0.5811(0.3734, 0.7334)0.7030(0.5380, 0.8161)graph_scatterplot
28SVM-Rank0.7286(0.5744, 0.8329)0.5529(0.3370, 0.7135)0.7067(0.5432, 0.8185)graph_scatterplot
29BLEU-10.6784(0.5038, 0.7998)0.5052(0.2771, 0.6795)0.6364(0.4465, 0.7715)graph_scatterplot
30Bleu-sbp0.7314(0.5783, 0.8347)0.5421(0.3233, 0.7059)0.6837(0.5112, 0.8033)graph_scatterplot
31invWer-0.7136(-0.8231, -0.5529)-0.5273(-0.6953, -0.3046)-0.6701(-0.7942, -0.4923)graph_scatterplot
32BLEU-v11b0.7433(0.5954, 0.8425)0.5529(0.3370, 0.7135)0.6766(0.5013, 0.7986)graph_scatterplot
33SR-Or0.2274(-0.0403, 0.4647)0.1732(-0.0966, 0.4192)0.1075(-0.1625, 0.3624)graph_scatterplot
34Badger0.3892(0.1381, 0.5932)0.2687(0.0036, 0.4985)0.3138(0.0529, 0.5346)graph_scatterplot
35Meteor-v0.70.7312(0.5780, 0.8346)0.5529(0.3370, 0.7135)0.6809(0.5073, 0.8015)graph_scatterplot
36MaxSim0.2693(0.0043, 0.4990)0.2013(-0.0676, 0.4430)0.1758(-0.0938, 0.4215)graph_scatterplot
37TERp-0.7685(-0.8587, -0.6320)-0.5798(-0.7324, -0.3717)-0.7050(-0.8174, -0.5408)graph_scatterplot

37 metrics (including 7 baseline metrics)
55 data points (total number of systems used)