Go Back

Correlation Results

Current Conditions

  • Human Assessment Type: Fluency, 5-point scale
  • Target Language: French
  • Correlation Level: system

Subdivisions

By track:

Ranking

Single Reference Track
RankMetric NameSpearman's RhoKendall's TauPearson's RGraphs
Value95% confidence intervalValue95% confidence intervalValue95% confidence interval
1BadgerLite0.6179(0.1545, 0.8584)0.4286(-0.1072, 0.7715)0.6304(0.1745, 0.8637)graph_scatterplot
2METEOR-ranking0.5321(0.0273, 0.8207)0.3714(-0.1739, 0.7424)0.5472(0.0486, 0.8275)graph_scatterplot
3LET0.4357(-0.0986, 0.7750)0.3143(-0.2360, 0.7119)0.5285(0.0223, 0.8190)graph_scatterplot
4SEPIA20.4857(-0.0354, 0.7991)0.3524(-0.1951, 0.7324)0.5972(0.1224, 0.8496)graph_scatterplot
5CDer-0.4500(-0.7820, 0.0809)-0.3333(-0.7223, 0.2158)-0.5900(-0.8464, -0.1113)graph_scatterplot
6ATEC30.2750(-0.2762, 0.6901)0.2190(-0.3303, 0.6575)0.4367(-0.0974, 0.7755)graph_scatterplot
7TER-v0.7.25-0.4429(-0.7785, 0.0898)-0.3143(-0.7119, 0.2360)-0.6165(-0.8579, -0.1524)graph_scatterplot
8BLEU-v120.6893(0.2736, 0.8880)0.5238(0.0158, 0.8169)0.6276(0.1699, 0.8625)graph_scatterplot
9BleuSP0.6214(0.1601, 0.8599)0.4476(-0.0839, 0.7808)0.6385(0.1876, 0.8671)graph_scatterplot
10NIST-v11b0.4821(-0.0400, 0.7975)0.3524(-0.1951, 0.7324)0.5988(0.1249, 0.8503)graph_scatterplot
11SVM-Rank0.6107(0.1433, 0.8554)0.4476(-0.0839, 0.7808)0.6246(0.1652, 0.8613)graph_scatterplot
12BLEU-10.4464(-0.0854, 0.7803)0.3333(-0.2158, 0.7223)0.6008(0.1278, 0.8511)graph_scatterplot
13ATEC40.2750(-0.2762, 0.6901)0.2190(-0.3303, 0.6575)0.4541(-0.0758, 0.7840)graph_scatterplot
14Bleu-sbp0.6143(0.1489, 0.8569)0.4286(-0.1072, 0.7715)0.5929(0.1158, 0.8477)graph_scatterplot
15ATEC10.2929(-0.2581, 0.7001)0.2381(-0.3123, 0.6688)0.4516(-0.0789, 0.7828)graph_scatterplot
16invWer-0.4536(-0.7837, 0.0765)-0.3524(-0.7324, 0.1951)-0.6227(-0.8605, -0.1622)graph_scatterplot
17SNR0.4821(-0.0400, 0.7975)0.3524(-0.1951, 0.7324)0.6310(0.1755, 0.8640)graph_scatterplot
18mBLEU0.6893(0.2736, 0.8880)0.5238(0.0158, 0.8169)0.6292(0.1725, 0.8632)graph_scatterplot
194-GRR0.5286(0.0224, 0.8191)0.3524(-0.1951, 0.7324)0.6118(0.1449, 0.8558)graph_scatterplot
20BLEU-v11b0.6679(0.2365, 0.8793)0.4857(-0.0354, 0.7991)0.6124(0.1459, 0.8561)graph_scatterplot
21Badger0.6179(0.1545, 0.8584)0.4286(-0.1072, 0.7715)0.6372(0.1855, 0.8666)graph_scatterplot
22ATEC20.2929(-0.2581, 0.7001)0.2381(-0.3123, 0.6688)0.4521(-0.0783, 0.7830)graph_scatterplot
23SEPIA10.5571(0.0628, 0.8320)0.3905(-0.1522, 0.7523)0.5871(0.1070, 0.8452)graph_scatterplot
24Meteor-v0.70.6179(0.1545, 0.8584)0.4476(-0.0839, 0.7808)0.5832(0.1012, 0.8435)graph_scatterplot
25MaxSim0.3464(-0.2016, 0.7293)0.2571(-0.2938, 0.6799)0.5014(-0.0146, 0.8065)graph_scatterplot
26mTER-0.4429(-0.7785, 0.0898)-0.3143(-0.7119, 0.2360)-0.6160(-0.8577, -0.1516)graph_scatterplot
27BLEU-40.6607(0.2244, 0.8764)0.4857(-0.0354, 0.7991)0.6129(0.1467, 0.8563)graph_scatterplot
28METEOR-v0.60.4393(-0.0942, 0.7768)0.3333(-0.2158, 0.7223)0.5592(0.0658, 0.8329)graph_scatterplot
29TERp-0.4736(-0.7934, 0.0510)-0.3445(-0.7283, 0.2037)-0.6079(-0.8542, -0.1389)graph_scatterplot

29 metrics (including 7 baseline metrics)
15 data points (total number of systems used)

Multiple References Track
RankMetric NameSpearman's RhoKendall's TauPearson's RGraphs
Value95% confidence intervalValue95% confidence intervalValue95% confidence interval
1BadgerLite0.3846(-0.2111, 0.7720)0.2821(-0.3184, 0.7210)0.7475(0.3342, 0.9197)graph_scatterplot
2METEOR-ranking0.5000(-0.0704, 0.8240)0.4103(-0.1818, 0.7840)0.8309(0.5163, 0.9479)graph_scatterplot
3LET0.5117(-0.0547, 0.8290)0.3484(-0.2507, 0.7545)0.8641(0.5977, 0.9587)graph_scatterplot
4SEPIA20.5330(-0.0255, 0.8379)0.3590(-0.2394, 0.7597)0.8758(0.6278, 0.9624)graph_scatterplot
5CDer-0.4615(-0.8072, 0.1200)-0.3590(-0.7597, 0.2394)-0.8470(-0.9531, -0.5549)graph_scatterplot
6ATEC30.4066(-0.1861, 0.7823)0.3590(-0.2394, 0.7597)0.7415(0.3220, 0.9176)graph_scatterplot
7TER-v0.7.25-0.4505(-0.8024, 0.1336)-0.3333(-0.7471, 0.2666)-0.8614(-0.9578, -0.5908)graph_scatterplot
8BLEU-v120.4890(-0.0848, 0.8193)0.3333(-0.2666, 0.7471)0.8663(0.6033, 0.9594)graph_scatterplot
9BleuSP0.5385(-0.0178, 0.8402)0.4103(-0.1818, 0.7840)0.8744(0.6241, 0.9619)graph_scatterplot
10NIST-v11b0.4670(-0.1131, 0.8097)0.3333(-0.2666, 0.7471)0.8658(0.6020, 0.9592)graph_scatterplot
11SVM-Rank0.5385(-0.0178, 0.8402)0.4615(-0.1200, 0.8072)0.8258(0.5042, 0.9462)graph_scatterplot
12BLEU-10.4539(-0.1294, 0.8039)0.2968(-0.3039, 0.7286)0.8526(0.5689, 0.9550)graph_scatterplot
13Bleu-sbp0.4890(-0.0848, 0.8193)0.3333(-0.2666, 0.7471)0.8566(0.5788, 0.9563)graph_scatterplot
14ATEC40.4066(-0.1861, 0.7823)0.3590(-0.2394, 0.7597)0.7370(0.3131, 0.9160)graph_scatterplot
15invWer-0.4505(-0.8024, 0.1336)-0.3333(-0.7471, 0.2666)-0.8640(-0.9586, -0.5975)graph_scatterplot
16ATEC10.3571(-0.2413, 0.7588)0.3333(-0.2666, 0.7471)0.7303(0.3000, 0.9136)graph_scatterplot
17SNR0.5714(0.0298, 0.8536)0.4872(-0.0872, 0.8185)0.7947(0.4336, 0.9359)graph_scatterplot
18mBLEU0.7033(0.2487, 0.9040)0.5385(-0.0178, 0.8402)0.8589(0.5844, 0.9570)graph_scatterplot
19BLEU-v11b0.4890(-0.0848, 0.8193)0.3333(-0.2666, 0.7471)0.8611(0.5901, 0.9577)graph_scatterplot
204-GRR0.4615(-0.1200, 0.8072)0.3590(-0.2394, 0.7597)0.8239(0.4998, 0.9456)graph_scatterplot
21Badger0.4286(-0.1603, 0.7924)0.3077(-0.2930, 0.7342)0.7629(0.3656, 0.9250)graph_scatterplot
22ATEC20.3571(-0.2413, 0.7588)0.3333(-0.2666, 0.7471)0.7301(0.2997, 0.9136)graph_scatterplot
23SEPIA10.4890(-0.0848, 0.8193)0.3333(-0.2666, 0.7471)0.8596(0.5862, 0.9572)graph_scatterplot
24Meteor-v0.70.5330(-0.0255, 0.8379)0.4359(-0.1515, 0.7958)0.8471(0.5552, 0.9532)graph_scatterplot
25MaxSim0.2802(-0.3202, 0.7200)0.2564(-0.3430, 0.7075)0.6023(0.0767, 0.8659)graph_scatterplot
26mTER-0.6209(-0.8732, -0.1062)-0.4615(-0.8072, 0.1200)-0.9069(-0.9721, -0.7115)graph_scatterplot
27BLEU-40.4890(-0.0848, 0.8193)0.3333(-0.2666, 0.7471)0.8578(0.5818, 0.9566)graph_scatterplot
28METEOR-v0.60.4560(-0.1268, 0.8048)0.3846(-0.2111, 0.7720)0.8175(0.4850, 0.9435)graph_scatterplot
29TERp-0.4539(-0.8039, 0.1294)-0.3742(-0.7670, 0.2227)-0.8585(-0.9569, -0.5836)graph_scatterplot

29 metrics (including 7 baseline metrics)
13 data points (total number of systems used)