Go Back

Correlation Results

Current Conditions

  • Human Assessment Type: Adequacy, Yes-No qualitative question, proportion of Yes assigned
  • Target Language: English
  • Correlation Level: system

Subdivisions

By track:

Ranking

Single Reference Track
RankMetric NameSpearman's RhoKendall's TauPearson's RGraphs
Value95% confidence intervalValue95% confidence intervalValue95% confidence interval
1SEPIA20.8308(0.7528, 0.8858)0.6336(0.4900, 0.7437)0.8102(0.7242, 0.8714)graph_scatterplot
2CDer-0.8390(-0.8915, -0.7642)-0.6423(-0.7502, -0.5011)-0.8061(-0.8686, -0.7185)graph_scatterplot
3ULCh0.6389(0.4968, 0.7477)0.4707(0.2910, 0.6183)0.6454(0.5051, 0.7525)graph_scatterplot
4TER-v0.7.25-0.8158(-0.8754, -0.7319)-0.6140(-0.7291, -0.4653)-0.7709(-0.8437, -0.6703)graph_scatterplot
5DP-Orp0.5829(0.4263, 0.7055)0.4365(0.2510, 0.5910)0.6146(0.4660, 0.7295)graph_scatterplot
6NIST-v11b0.8384(0.7634, 0.8910)0.6398(0.4978, 0.7483)0.8060(0.7183, 0.8685)graph_scatterplot
7ATEC40.7581(0.6530, 0.8346)0.5529(0.3895, 0.6826)0.7338(0.6204, 0.8172)graph_scatterplot
8ATEC10.7647(0.6619, 0.8393)0.5660(0.4056, 0.6926)0.7395(0.6280, 0.8213)graph_scatterplot
9mBLEU0.6093(0.4593, 0.7255)0.4589(0.2771, 0.6089)0.4245(0.2372, 0.5814)graph_scatterplot
10SNR0.6479(0.5082, 0.7544)0.4758(0.2970, 0.6224)0.6097(0.4598, 0.7258)graph_scatterplot
114-GRR0.8158(0.7319, 0.8754)0.6117(0.4623, 0.7273)0.8010(0.7115, 0.8650)graph_scatterplot
12ATEC20.7646(0.6617, 0.8392)0.5631(0.4020, 0.6904)0.7426(0.6321, 0.8235)graph_scatterplot
13SEPIA10.8459(0.7739, 0.8962)0.6495(0.5102, 0.7555)0.8216(0.7400, 0.8794)graph_scatterplot
14ULCopt0.6439(0.5031, 0.7514)0.4738(0.2946, 0.6208)0.6225(0.4759, 0.7354)graph_scatterplot
15mTER-0.5751(-0.6996, -0.4168)-0.4288(-0.5848, -0.2422)-0.2544(-0.4394, -0.0488)graph_scatterplot
16EDPM0.8509(0.7810, 0.8997)0.6490(0.5096, 0.7552)0.8192(0.7366, 0.8777)graph_scatterplot
17BLEU-40.8289(0.7501, 0.8844)0.6260(0.4803, 0.7380)0.8059(0.7182, 0.8684)graph_scatterplot
18METEOR-v0.60.8528(0.7837, 0.9010)0.6617(0.5260, 0.7646)0.8222(0.7408, 0.8798)graph_scatterplot
19RTE-MT0.7798(0.6823, 0.8500)0.5805(0.4234, 0.7037)0.7519(0.6445, 0.8301)graph_scatterplot
20BadgerLite0.7416(0.6307, 0.8228)0.5422(0.3765, 0.6743)0.7646(0.6617, 0.8392)graph_scatterplot
21METEOR-ranking0.8572(0.7899, 0.9041)0.6679(0.5339, 0.7691)0.8292(0.7506, 0.8847)graph_scatterplot
22LET0.8339(0.7571, 0.8879)0.6367(0.4939, 0.7460)0.7847(0.6890, 0.8535)graph_scatterplot
23DP-Or0.6626(0.5271, 0.7653)0.4913(0.3152, 0.6346)0.7057(0.5832, 0.7969)graph_scatterplot
24ATEC30.7891(0.6951, 0.8566)0.5766(0.4186, 0.7007)0.7641(0.6611, 0.8389)graph_scatterplot
25BLEU-v120.8483(0.7774, 0.8979)0.6467(0.5067, 0.7535)0.8179(0.7348, 0.8768)graph_scatterplot
26BEwT-E0.7067(0.5844, 0.7975)0.5305(0.3622, 0.6652)0.6270(0.4817, 0.7388)graph_scatterplot
27RTE0.7386(0.6268, 0.8206)0.5550(0.3920, 0.6842)0.7217(0.6043, 0.8085)graph_scatterplot
28DR-Or0.5941(0.4404, 0.7141)0.4417(0.2571, 0.5952)0.6237(0.4774, 0.7363)graph_scatterplot
29BleuSP0.8478(0.7767, 0.8976)0.6495(0.5102, 0.7555)0.8138(0.7291, 0.8739)graph_scatterplot
30SVM-Rank0.8471(0.7757, 0.8971)0.6439(0.5030, 0.7514)0.7921(0.6992, 0.8587)graph_scatterplot
31BLEU-10.8381(0.7630, 0.8909)0.6345(0.4911, 0.7444)0.8105(0.7246, 0.8717)graph_scatterplot
32Bleu-sbp0.8407(0.7667, 0.8927)0.6387(0.4965, 0.7476)0.8122(0.7269, 0.8728)graph_scatterplot
33invWer-0.8165(-0.8758, -0.7329)-0.6158(-0.7304, -0.4674)-0.7546(-0.8321, -0.6481)graph_scatterplot
34BLEU-v11b0.8340(0.7573, 0.8880)0.6318(0.4877, 0.7423)0.8069(0.7196, 0.8691)graph_scatterplot
35SR-Or0.6138(0.4650, 0.7289)0.4430(0.2586, 0.5963)0.6301(0.4855, 0.7411)graph_scatterplot
36Badger0.7393(0.6277, 0.8211)0.5381(0.3715, 0.6712)0.7718(0.6715, 0.8444)graph_scatterplot
37Meteor-v0.70.8554(0.7874, 0.9028)0.6658(0.5312, 0.7676)0.8173(0.7340, 0.8764)graph_scatterplot
38MaxSim0.6678(0.5338, 0.7691)0.4973(0.3224, 0.6393)0.6872(0.5589, 0.7833)graph_scatterplot
39TERp-0.8729(-0.9148, -0.8123)-0.6919(-0.7868, -0.5650)-0.8256(-0.8822, -0.7456)graph_scatterplot

39 metrics (including 7 baseline metrics)
89 data points (total number of systems used)

Multiple References Track
RankMetric NameSpearman's RhoKendall's TauPearson's RGraphs
Value95% confidence intervalValue95% confidence intervalValue95% confidence interval
1SEPIA20.8646(0.7777, 0.9191)0.6831(0.5102, 0.8029)0.8167(0.7039, 0.8893)graph_scatterplot
2CDer-0.8575(-0.9147, -0.7666)-0.6763(-0.7984, -0.5009)-0.8355(-0.9011, -0.7325)graph_scatterplot
3ULCh0.3269(0.0674, 0.5449)0.2789(0.0147, 0.5067)0.1795(-0.0901, 0.4246)graph_scatterplot
4TER-v0.7.25-0.7917(-0.8735, -0.6663)-0.5941(-0.7424, -0.3904)-0.7653(-0.8567, -0.6273)graph_scatterplot
5DP-Orp0.3060(0.0443, 0.5284)0.2554(-0.0106, 0.4877)0.1128(-0.1572, 0.3671)graph_scatterplot
6NIST-v11b0.8569(0.7657, 0.9144)0.6718(0.4947, 0.7954)0.8279(0.7209, 0.8963)graph_scatterplot
7ATEC40.7912(0.6656, 0.8732)0.6184(0.4224, 0.7591)0.7920(0.6668, 0.8737)graph_scatterplot
8ATEC10.7899(0.6638, 0.8724)0.6251(0.4314, 0.7638)0.7931(0.6685, 0.8744)graph_scatterplot
9SNR0.2585(-0.0073, 0.4902)0.2021(-0.0668, 0.4436)0.1407(-0.1294, 0.3914)graph_scatterplot
10mBLEU0.7046(0.5403, 0.8172)0.5209(0.2966, 0.6907)0.5970(0.3941, 0.7444)graph_scatterplot
114-GRR0.8092(0.6927, 0.8846)0.6211(0.4260, 0.7610)0.8016(0.6811, 0.8798)graph_scatterplot
12ATEC20.7914(0.6659, 0.8733)0.6224(0.4278, 0.7619)0.7905(0.6646, 0.8728)graph_scatterplot
13SEPIA10.8826(0.8061, 0.9301)0.7167(0.5574, 0.8251)0.8623(0.7740, 0.9176)graph_scatterplot
14ULCopt0.3133(0.0524, 0.5342)0.2519(-0.0143, 0.4848)0.1586(-0.1114, 0.4067)graph_scatterplot
15EDPM0.8747(0.7935, 0.9253)0.6911(0.5215, 0.8082)0.8352(0.7322, 0.9009)graph_scatterplot
16mTER-0.6653(-0.7910, -0.4858)-0.4581(-0.6450, -0.2194)-0.4446(-0.6350, -0.2033)graph_scatterplot
17BLEU-40.8664(0.7805, 0.9202)0.6813(0.5077, 0.8017)0.8454(0.7478, 0.9072)graph_scatterplot
18METEOR-v0.60.8663(0.7804, 0.9201)0.6804(0.5065, 0.8011)0.8445(0.7465, 0.9067)graph_scatterplot
19BadgerLite0.4309(0.1869, 0.6247)0.2937(0.0308, 0.5186)0.5427(0.3240, 0.7063)graph_scatterplot
20METEOR-ranking0.8616(0.7729, 0.9172)0.6736(0.4972, 0.7966)0.8508(0.7562, 0.9106)graph_scatterplot
21LET0.8406(0.7404, 0.9042)0.6628(0.4824, 0.7893)0.7920(0.6669, 0.8737)graph_scatterplot
22DP-Or0.3235(0.0636, 0.5422)0.2679(0.0028, 0.4978)0.2503(-0.0161, 0.4835)graph_scatterplot
23ATEC30.7784(0.6466, 0.8651)0.6049(0.4045, 0.7499)0.7823(0.6524, 0.8676)graph_scatterplot
24BLEU-v120.8711(0.7879, 0.9231)0.6817(0.5083, 0.8020)0.8367(0.7344, 0.9018)graph_scatterplot
25BEwT-E0.7919(0.6667, 0.8737)0.6049(0.4045, 0.7499)0.7355(0.5842, 0.8374)graph_scatterplot
26DR-Or0.3207(0.0606, 0.5401)0.2621(-0.0034, 0.4931)0.1968(-0.0722, 0.4392)graph_scatterplot
27BleuSP0.8643(0.7773, 0.9189)0.6790(0.5046, 0.8002)0.8567(0.7654, 0.9142)graph_scatterplot
28SVM-Rank0.8545(0.7619, 0.9128)0.6858(0.5140, 0.8047)0.8597(0.7700, 0.9161)graph_scatterplot
29BLEU-10.8141(0.7000, 0.8877)0.6341(0.4434, 0.7699)0.7750(0.6416, 0.8629)graph_scatterplot
30Bleu-sbp0.8686(0.7839, 0.9215)0.6858(0.5140, 0.8047)0.8442(0.7459, 0.9065)graph_scatterplot
31invWer-0.8349(-0.9007, -0.7316)-0.6292(-0.7665, -0.4368)-0.8091(-0.8845, -0.6925)graph_scatterplot
32BLEU-v11b0.8664(0.7805, 0.9202)0.6750(0.4990, 0.7975)0.8318(0.7269, 0.8988)graph_scatterplot
33SR-Or0.3260(0.0664, 0.5442)0.2791(0.0149, 0.5068)0.1846(-0.0849, 0.4289)graph_scatterplot
34Badger0.4002(0.1509, 0.6016)0.2708(0.0059, 0.5002)0.4745(0.2393, 0.6571)graph_scatterplot
35Meteor-v0.70.8763(0.7961, 0.9263)0.7019(0.5365, 0.8154)0.8485(0.7527, 0.9092)graph_scatterplot
36MaxSim0.3568(0.1011, 0.5683)0.2802(0.0161, 0.5078)0.2887(0.0254, 0.5146)graph_scatterplot
37TERp-0.8761(-0.9261, -0.7957)-0.7019(-0.8154, -0.5365)-0.8609(-0.9168, -0.7719)graph_scatterplot

37 metrics (including 7 baseline metrics)
55 data points (total number of systems used)