Go Back

Correlation Results

Current Conditions

  • Human Assessment Type: Adequacy, 5-point scale
  • Target Language: French
  • Correlation Level: system

Subdivisions

By track:

Ranking

Single Reference Track
RankMetric NameSpearman's RhoKendall's TauPearson's RGraphs
Value95% confidence intervalValue95% confidence intervalValue95% confidence interval
1BadgerLite0.7250(0.3384, 0.9022)0.5810(0.0978, 0.8425)0.7744(0.4345, 0.9212)graph_scatterplot
2METEOR-ranking0.8321(0.5575, 0.9426)0.6762(0.2508, 0.8827)0.7716(0.4288, 0.9202)graph_scatterplot
3LET0.7786(0.4430, 0.9228)0.6190(0.1564, 0.8589)0.8326(0.5586, 0.9428)graph_scatterplot
4SEPIA20.7786(0.4430, 0.9228)0.6190(0.1564, 0.8589)0.8367(0.5679, 0.9443)graph_scatterplot
5CDer-0.7429(-0.9092, -0.3722)-0.6000(-0.8508, -0.1267)-0.8795(-0.9595, -0.6684)graph_scatterplot
6ATEC30.6500(0.2065, 0.8719)0.4857(-0.0354, 0.7991)0.6955(0.2847, 0.8905)graph_scatterplot
7TER-v0.7.25-0.6571(-0.8749, -0.2184)-0.5048(-0.8081, 0.0101)-0.8976(-0.9658, -0.7133)graph_scatterplot
8BLEU-v120.8536(0.6065, 0.9503)0.7143(0.3186, 0.8980)0.8035(0.4949, 0.9321)graph_scatterplot
9BleuSP0.8643(0.6317, 0.9541)0.7524(0.3907, 0.9128)0.8390(0.5729, 0.9451)graph_scatterplot
10NIST-v11b0.7393(0.3654, 0.9078)0.6190(0.1564, 0.8589)0.8446(0.5859, 0.9471)graph_scatterplot
11SVM-Rank0.8643(0.6317, 0.9541)0.7524(0.3907, 0.9128)0.8416(0.5790, 0.9460)graph_scatterplot
12BLEU-10.7714(0.4285, 0.9201)0.6381(0.1869, 0.8670)0.8664(0.6367, 0.9549)graph_scatterplot
13ATEC40.6500(0.2065, 0.8719)0.4857(-0.0354, 0.7991)0.7018(0.2960, 0.8930)graph_scatterplot
14Bleu-sbp0.8107(0.5104, 0.9348)0.6571(0.2184, 0.8749)0.7893(0.4651, 0.9268)graph_scatterplot
15ATEC10.6607(0.2244, 0.8764)0.5048(-0.0101, 0.8081)0.6981(0.2892, 0.8915)graph_scatterplot
16invWer-0.6750(-0.8822, -0.2487)-0.5429(-0.8256, -0.0424)-0.9012(-0.9670, -0.7226)graph_scatterplot
17SNR0.7964(0.4800, 0.9295)0.6190(0.1564, 0.8589)0.9103(0.7458, 0.9702)graph_scatterplot
18mBLEU0.7750(0.4357, 0.9214)0.6381(0.1869, 0.8670)0.7948(0.4766, 0.9289)graph_scatterplot
194-GRR0.7929(0.4725, 0.9282)0.6571(0.2184, 0.8749)0.8222(0.5353, 0.9390)graph_scatterplot
20BLEU-v11b0.8321(0.5575, 0.9426)0.6762(0.2508, 0.8827)0.7990(0.4854, 0.9304)graph_scatterplot
21Badger0.7250(0.3384, 0.9022)0.5810(0.0978, 0.8425)0.7898(0.4662, 0.9270)graph_scatterplot
22ATEC20.6607(0.2244, 0.8764)0.5048(-0.0101, 0.8081)0.6986(0.2902, 0.8917)graph_scatterplot
23SEPIA10.8000(0.4875, 0.9308)0.6571(0.2184, 0.8749)0.7968(0.4808, 0.9296)graph_scatterplot
24Meteor-v0.70.8536(0.6065, 0.9503)0.7143(0.3186, 0.8980)0.7754(0.4365, 0.9216)graph_scatterplot
25MaxSim0.6500(0.2065, 0.8719)0.4857(-0.0354, 0.7991)0.6944(0.2827, 0.8901)graph_scatterplot
26mTER-0.6536(-0.8734, -0.2124)-0.5429(-0.8256, -0.0424)-0.8948(-0.9648, -0.7064)graph_scatterplot
27BLEU-40.8464(0.5900, 0.9478)0.6762(0.2508, 0.8827)0.7988(0.4849, 0.9303)graph_scatterplot
28METEOR-v0.60.7750(0.4357, 0.9214)0.6000(0.1267, 0.8508)0.8124(0.5140, 0.9354)graph_scatterplot
29TERp-0.7846(-0.9251, -0.4554)-0.6507(-0.8722, -0.2077)-0.8651(-0.9544, -0.6336)graph_scatterplot

29 metrics (including 7 baseline metrics)
15 data points (total number of systems used)

Multiple References Track
RankMetric NameSpearman's RhoKendall's TauPearson's RGraphs
Value95% confidence intervalValue95% confidence intervalValue95% confidence interval
1BadgerLite0.5714(0.0298, 0.8536)0.4103(-0.1818, 0.7840)0.8277(0.5088, 0.9469)graph_scatterplot
2METEOR-ranking0.8077(0.4625, 0.9402)0.6410(0.1392, 0.8809)0.9096(0.7190, 0.9730)graph_scatterplot
3LET0.6327(0.1255, 0.8777)0.4774(-0.0998, 0.8142)0.9109(0.7226, 0.9734)graph_scatterplot
4SEPIA20.6538(0.1608, 0.8857)0.4872(-0.0872, 0.8185)0.9194(0.7465, 0.9760)graph_scatterplot
5CDer-0.7033(-0.9040, -0.2487)-0.5385(-0.8402, 0.0178)-0.9332(-0.9802, -0.7867)graph_scatterplot
6ATEC30.7033(0.2487, 0.9040)0.5385(-0.0178, 0.8402)0.8504(0.5633, 0.9542)graph_scatterplot
7TER-v0.7.25-0.6703(-0.8919, -0.1892)-0.5128(-0.8294, 0.0532)-0.9283(-0.9787, -0.7722)graph_scatterplot
8BLEU-v120.6099(0.0887, 0.8689)0.4615(-0.1200, 0.8072)0.9021(0.6981, 0.9706)graph_scatterplot
9BleuSP0.7582(0.3560, 0.9234)0.5897(0.0574, 0.8610)0.9321(0.7834, 0.9798)graph_scatterplot
10NIST-v11b0.6429(0.1423, 0.8816)0.5128(-0.0532, 0.8294)0.9243(0.7608, 0.9775)graph_scatterplot
11SVM-Rank0.8022(0.4502, 0.9384)0.6410(0.1392, 0.8809)0.9235(0.7584, 0.9772)graph_scatterplot
12BLEU-10.6135(0.0944, 0.8703)0.4774(-0.0998, 0.8142)0.9122(0.7262, 0.9738)graph_scatterplot
13Bleu-sbp0.6099(0.0887, 0.8689)0.4615(-0.1200, 0.8072)0.9054(0.7073, 0.9717)graph_scatterplot
14ATEC40.7033(0.2487, 0.9040)0.5385(-0.0178, 0.8402)0.8424(0.5439, 0.9517)graph_scatterplot
15invWer-0.6703(-0.8919, -0.1892)-0.5128(-0.8294, 0.0532)-0.9289(-0.9789, -0.7743)graph_scatterplot
16ATEC10.6648(0.1797, 0.8899)0.5128(-0.0532, 0.8294)0.8332(0.5217, 0.9487)graph_scatterplot
17SNR0.8297(0.5134, 0.9475)0.6667(0.1828, 0.8905)0.8807(0.6405, 0.9639)graph_scatterplot
18mBLEU0.6538(0.1608, 0.8857)0.5641(0.0190, 0.8507)0.8595(0.5860, 0.9572)graph_scatterplot
19BLEU-v11b0.6099(0.0887, 0.8689)0.4615(-0.1200, 0.8072)0.9050(0.7062, 0.9716)graph_scatterplot
204-GRR0.7033(0.2487, 0.9040)0.5385(-0.0178, 0.8402)0.9138(0.7309, 0.9743)graph_scatterplot
21Badger0.6154(0.0974, 0.8710)0.4359(-0.1515, 0.7958)0.8470(0.5549, 0.9531)graph_scatterplot
22ATEC20.6648(0.1797, 0.8899)0.5128(-0.0532, 0.8294)0.8331(0.5215, 0.9486)graph_scatterplot
23SEPIA10.6099(0.0887, 0.8689)0.4615(-0.1200, 0.8072)0.9072(0.7121, 0.9722)graph_scatterplot
24Meteor-v0.70.7967(0.4380, 0.9366)0.6154(0.0974, 0.8710)0.9033(0.7013, 0.9710)graph_scatterplot
25MaxSim0.5769(0.0380, 0.8559)0.4359(-0.1515, 0.7958)0.7290(0.2974, 0.9132)graph_scatterplot
26mTER-0.5879(-0.8602, -0.0546)-0.4872(-0.8185, 0.0872)-0.9365(-0.9812, -0.7967)graph_scatterplot
27BLEU-40.6099(0.0887, 0.8689)0.4615(-0.1200, 0.8072)0.9078(0.7140, 0.9724)graph_scatterplot
28METEOR-v0.60.7637(0.3673, 0.9253)0.6154(0.0974, 0.8710)0.9161(0.7374, 0.9750)graph_scatterplot
29TERp-0.7620(-0.9247, -0.3638)-0.6065(-0.8676, -0.0833)-0.9224(-0.9769, -0.7551)graph_scatterplot

29 metrics (including 7 baseline metrics)
13 data points (total number of systems used)