Go Back

Correlation Results

Current Conditions

  • Human Assessment Type: Adequacy, 7-point scale, straight average
  • Target Language: English
  • Correlation Level: system

Subdivisions

By track:

Ranking

Single Reference Track
RankMetric NameSpearman's RhoKendall's TauPearson's RGraphs
Value95% confidence intervalValue95% confidence intervalValue95% confidence interval
1SEPIA20.8620(0.7967, 0.9073)0.6742(0.5420, 0.7738)0.8501(0.7799, 0.8992)graph_category graph_category_2 graph_scatterplot
2CDer-0.9037(-0.9359, -0.8567)-0.7360(-0.8187, -0.6232)-0.8805(-0.9201, -0.8232)graph_category graph_category_2 graph_scatterplot
3ULCh0.5422(0.3765, 0.6743)0.4070(0.2172, 0.5672)0.5439(0.3785, 0.6756)graph_category graph_category_2 graph_scatterplot
4TER-v0.7.25-0.8877(-0.9250, -0.8336)-0.7133(-0.8024, -0.5932)-0.8542(-0.9020, -0.7857)graph_category graph_category_2 graph_scatterplot
5DP-Orp0.4712(0.2916, 0.6188)0.3718(0.1773, 0.5384)0.4962(0.3212, 0.6385)graph_category graph_category_2 graph_scatterplot
6NIST-v11b0.8775(0.8190, 0.9180)0.6915(0.5646, 0.7865)0.8534(0.7845, 0.9014)graph_category graph_category_2 graph_scatterplot
7ATEC40.8315(0.7537, 0.8863)0.6353(0.4922, 0.7450)0.8099(0.7237, 0.8712)graph_category graph_category_2 graph_scatterplot
8ATEC10.8375(0.7622, 0.8905)0.6444(0.5037, 0.7517)0.8148(0.7306, 0.8747)graph_category graph_category_2 graph_scatterplot
9mBLEU0.6909(0.5637, 0.7860)0.5694(0.4098, 0.6952)0.5470(0.3822, 0.6780)graph_category graph_category_2 graph_scatterplot
10SNR0.5617(0.4003, 0.6893)0.4244(0.2371, 0.5813)0.5189(0.3483, 0.6562)graph_category graph_category_2 graph_scatterplot
114-GRR0.8550(0.7868, 0.9025)0.6680(0.5341, 0.7693)0.8304(0.7523, 0.8855)graph_category graph_category_2 graph_scatterplot
12ATEC20.8380(0.7629, 0.8908)0.6466(0.5065, 0.7534)0.8157(0.7318, 0.8753)graph_category graph_category_2 graph_scatterplot
13SEPIA10.8689(0.8066, 0.9121)0.6849(0.5559, 0.7817)0.8481(0.7771, 0.8978)graph_category graph_category_2 graph_scatterplot
14ULCopt0.5591(0.3970, 0.6873)0.4244(0.2371, 0.5813)0.5285(0.3599, 0.6637)graph_category graph_category_2 graph_scatterplot
15mTER-0.6841(-0.7811, -0.5549)-0.5495(-0.6800, -0.3854)-0.4223(-0.5796, -0.2347)graph_category graph_category_2 graph_scatterplot
16EDPM0.8797(0.8220, 0.9195)0.6874(0.5592, 0.7835)0.8510(0.7811, 0.8998)graph_category graph_category_2 graph_scatterplot
17BLEU-40.8423(0.7689, 0.8937)0.6512(0.5124, 0.7568)0.8221(0.7407, 0.8798)graph_category graph_category_2 graph_scatterplot
18METEOR-v0.60.8876(0.8335, 0.9249)0.7002(0.5759, 0.7928)0.8701(0.8082, 0.9129)graph_category graph_category_2 graph_scatterplot
19RTE-MT0.6960(0.5705, 0.7898)0.5271(0.3581, 0.6626)0.7004(0.5761, 0.7930)graph_category graph_category_2 graph_scatterplot
20BadgerLite0.6902(0.5629, 0.7856)0.4990(0.3244, 0.6407)0.7212(0.6036, 0.8081)graph_category graph_category_2 graph_scatterplot
21METEOR-ranking0.8906(0.8376, 0.9269)0.7074(0.5853, 0.7981)0.8729(0.8123, 0.9148)graph_category graph_category_2 graph_scatterplot
22LET0.8793(0.8214, 0.9192)0.6823(0.5526, 0.7798)0.8547(0.7864, 0.9023)graph_category graph_category_2 graph_scatterplot
23DP-Or0.5826(0.4260, 0.7053)0.4501(0.2668, 0.6019)0.6339(0.4904, 0.7440)graph_category graph_category_2 graph_scatterplot
24ATEC30.8740(0.8139, 0.9156)0.6830(0.5535, 0.7803)0.8505(0.7804, 0.8994)graph_category graph_category_2 graph_scatterplot
25BLEU-v120.8567(0.7892, 0.9037)0.6709(0.5378, 0.7714)0.8335(0.7566, 0.8877)graph_category graph_category_2 graph_scatterplot
26BEwT-E0.7776(0.6794, 0.8485)0.6272(0.4818, 0.7389)0.7376(0.6254, 0.8199)graph_category graph_category_2 graph_scatterplot
27RTE0.6226(0.4761, 0.7355)0.4545(0.2720, 0.6055)0.6427(0.5016, 0.7505)graph_category graph_category_2 graph_scatterplot
28DR-Or0.5009(0.3267, 0.6422)0.3821(0.1889, 0.5469)0.5284(0.3598, 0.6637)graph_category graph_category_2 graph_scatterplot
29BleuSP0.8465(0.7748, 0.8967)0.6542(0.5163, 0.7591)0.8250(0.7447, 0.8818)graph_category graph_category_2 graph_scatterplot
30SVM-Rank0.8845(0.8289, 0.9228)0.7038(0.5806, 0.7955)0.8495(0.7791, 0.8988)graph_category graph_category_2 graph_scatterplot
31BLEU-10.8608(0.7951, 0.9066)0.6699(0.5365, 0.7706)0.8453(0.7731, 0.8959)graph_category graph_category_2 graph_scatterplot
32Bleu-sbp0.8679(0.8052, 0.9114)0.6793(0.5486, 0.7775)0.8382(0.7631, 0.8909)graph_category graph_category_2 graph_scatterplot
33invWer-0.8921(-0.9280, -0.8399)-0.7222(-0.8088, -0.6049)-0.8530(-0.9012, -0.7841)graph_category graph_category_2 graph_scatterplot
34BLEU-v11b0.8480(0.7769, 0.8977)0.6554(0.5179, 0.7600)0.8239(0.7432, 0.8810)graph_category graph_category_2 graph_scatterplot
35SR-Or0.5096(0.3371, 0.6490)0.3783(0.1846, 0.5438)0.5371(0.3702, 0.6704)graph_category graph_category_2 graph_scatterplot
36Badger0.6629(0.5274, 0.7655)0.4918(0.3159, 0.6350)0.7061(0.5837, 0.7972)graph_category graph_category_2 graph_scatterplot
37Meteor-v0.70.8968(0.8466, 0.9311)0.7125(0.5920, 0.8018)0.8745(0.8146, 0.9159)graph_category graph_category_2 graph_scatterplot
38MaxSim0.5879(0.4326, 0.7093)0.4499(0.2667, 0.6018)0.6172(0.4693, 0.7314)graph_category graph_category_2 graph_scatterplot
39TERp-0.8685(-0.9118, -0.8060)-0.6803(-0.7783, -0.5499)-0.8493(-0.8986, -0.7788)graph_category graph_category_2 graph_scatterplot

39 metrics (including 7 baseline metrics)
89 data points (total number of systems used)

Multiple References Track
RankMetric NameSpearman's RhoKendall's TauPearson's RGraphs
Value95% confidence intervalValue95% confidence intervalValue95% confidence interval
1SEPIA20.9307(0.8836, 0.9592)0.7764(0.6437, 0.8638)0.9158(0.8593, 0.9502)graph_category graph_category_2 graph_scatterplot
2CDer-0.9167(-0.9508, -0.8607)-0.7616(-0.8543, -0.6220)-0.9038(-0.9430, -0.8398)graph_category graph_category_2 graph_scatterplot
3ULCh0.1258(-0.1443, 0.3785)0.2135(-0.0549, 0.4531)-0.0402(-0.3023, 0.2275)graph_category graph_category_2 graph_scatterplot
4TER-v0.7.25-0.8952(-0.9378, -0.8262)-0.7239(-0.8298, -0.5676)-0.8676(-0.9209, -0.7824)graph_category graph_category_2 graph_scatterplot
5DP-Orp0.0848(-0.1846, 0.3424)0.1886(-0.0807, 0.4323)-0.1381(-0.3891, 0.1321)graph_category graph_category_2 graph_scatterplot
6NIST-v11b0.9314(0.8847, 0.9596)0.7747(0.6411, 0.8627)0.9117(0.8526, 0.9478)graph_category graph_category_2 graph_scatterplot
7ATEC40.9001(0.8339, 0.9407)0.7428(0.5946, 0.8421)0.8937(0.8237, 0.9369)graph_category graph_category_2 graph_scatterplot
8ATEC10.9017(0.8366, 0.9417)0.7441(0.5966, 0.8430)0.8966(0.8284, 0.9386)graph_category graph_category_2 graph_scatterplot
9SNR0.0628(-0.2059, 0.3228)0.1313(-0.1388, 0.3833)-0.0558(-0.3164, 0.2127)graph_category graph_category_2 graph_scatterplot
10mBLEU0.8526(0.7590, 0.9117)0.7019(0.5365, 0.8154)0.7786(0.6469, 0.8652)graph_category graph_category_2 graph_scatterplot
114-GRR0.8356(0.7328, 0.9012)0.6727(0.4959, 0.7960)0.8127(0.6978, 0.8868)graph_category graph_category_2 graph_scatterplot
12ATEC20.9021(0.8371, 0.9420)0.7495(0.6044, 0.8465)0.8940(0.8242, 0.9370)graph_category graph_category_2 graph_scatterplot
13SEPIA10.9031(0.8388, 0.9426)0.7508(0.6063, 0.8474)0.9004(0.8344, 0.9409)graph_category graph_category_2 graph_scatterplot
14ULCopt0.1146(-0.1554, 0.3687)0.1892(-0.0801, 0.4328)-0.0485(-0.3098, 0.2196)graph_category graph_category_2 graph_scatterplot
15EDPM0.9218(0.8690, 0.9538)0.7684(0.6318, 0.8586)0.8975(0.8298, 0.9392)graph_category graph_category_2 graph_scatterplot
16mTER-0.8610(-0.9169, -0.7721)-0.6633(-0.7896, -0.4830)-0.7058(-0.8180, -0.5421)graph_category graph_category_2 graph_scatterplot
17BLEU-40.8909(0.8192, 0.9351)0.7369(0.5863, 0.8383)0.8690(0.7846, 0.9218)graph_category graph_category_2 graph_scatterplot
18METEOR-v0.60.8457(0.7482, 0.9074)0.6741(0.4978, 0.7969)0.8477(0.7514, 0.9086)graph_category graph_category_2 graph_scatterplot
19BadgerLite0.2450(-0.0217, 0.4792)0.1596(-0.1104, 0.4076)0.3042(0.0423, 0.5269)graph_category graph_category_2 graph_scatterplot
20METEOR-ranking0.8444(0.7463, 0.9066)0.6700(0.4922, 0.7942)0.8490(0.7533, 0.9094)graph_category graph_category_2 graph_scatterplot
21LET0.9335(0.8882, 0.9609)0.7778(0.6457, 0.8647)0.9201(0.8663, 0.9528)graph_category graph_category_2 graph_scatterplot
22DP-Or0.1219(-0.1482, 0.3750)0.1983(-0.0707, 0.4404)0.0775(-0.1917, 0.3359)graph_category graph_category_2 graph_scatterplot
23ATEC30.9341(0.8892, 0.9612)0.7832(0.6537, 0.8681)0.9234(0.8717, 0.9548)graph_category graph_category_2 graph_scatterplot
24BLEU-v120.9008(0.8351, 0.9412)0.7481(0.6024, 0.8456)0.8695(0.7853, 0.9221)graph_category graph_category_2 graph_scatterplot
25BEwT-E0.9240(0.8727, 0.9552)0.7764(0.6437, 0.8638)0.8988(0.8319, 0.9400)graph_category graph_category_2 graph_scatterplot
26DR-Or0.1207(-0.1494, 0.3740)0.1967(-0.0724, 0.4391)-0.0073(-0.2721, 0.2585)graph_category graph_category_2 graph_scatterplot
27BleuSP0.8360(0.7334, 0.9014)0.6646(0.4848, 0.7905)0.8488(0.7531, 0.9093)graph_category graph_category_2 graph_scatterplot
28SVM-Rank0.8247(0.7161, 0.8944)0.6660(0.4867, 0.7915)0.8442(0.7460, 0.9065)graph_category graph_category_2 graph_scatterplot
29BLEU-10.9086(0.8477, 0.9459)0.7329(0.5805, 0.8357)0.8852(0.8102, 0.9317)graph_category graph_category_2 graph_scatterplot
30Bleu-sbp0.9113(0.8519, 0.9475)0.7522(0.6083, 0.8482)0.8807(0.8030, 0.9289)graph_category graph_category_2 graph_scatterplot
31invWer-0.9120(-0.9479, -0.8531)-0.7508(-0.8474, -0.6063)-0.8992(-0.9402, -0.8324)graph_category graph_category_2 graph_scatterplot
32BLEU-v11b0.8967(0.8285, 0.9387)0.7360(0.5849, 0.8377)0.8661(0.7801, 0.9200)graph_category graph_category_2 graph_scatterplot
33SR-Or0.1238(-0.1463, 0.3767)0.2082(-0.0604, 0.4487)-0.0036(-0.2687, 0.2619)graph_category graph_category_2 graph_scatterplot
34Badger0.1551(-0.1149, 0.4038)0.0936(-0.1761, 0.3502)0.1857(-0.0837, 0.4298)graph_category graph_category_2 graph_scatterplot
35Meteor-v0.70.8830(0.8067, 0.9304)0.7172(0.5580, 0.8254)0.8739(0.7922, 0.9248)graph_category graph_category_2 graph_scatterplot
36MaxSim0.1565(-0.1135, 0.4050)0.2229(-0.0451, 0.4609)0.1250(-0.1451, 0.3778)graph_category graph_category_2 graph_scatterplot
37TERp-0.8659(-0.9199, -0.7798)-0.7118(-0.8219, -0.5504)-0.8657(-0.9198, -0.7795)graph_category graph_category_2 graph_scatterplot

37 metrics (including 7 baseline metrics)
55 data points (total number of systems used)