Go Back

Correlation Results

Current Conditions

  • Human Assessment Type: Preferences, Pair-wise comparison across systems
  • Target Language: English
  • Correlation Level: document

Subdivisions

By track:

Ranking

Single Reference Track
RankMetric NameSpearman's RhoKendall's TauPearson's RGraphs
Value95% confidence intervalValue95% confidence intervalValue95% confidence interval
1SEPIA20.4533(0.4185, 0.4868)0.3099(0.2706, 0.3482)0.4406(0.4054, 0.4746)graph_scatterplot
2CDer-0.4552(-0.4886, -0.4204)-0.3102(-0.3485, -0.2708)-0.4581(-0.4914, -0.4235)graph_scatterplot
3ULCh0.4177(0.3816, 0.4525)0.2791(0.2390, 0.3182)0.4699(0.4357, 0.5027)graph_scatterplot
4TER-v0.7.25-0.4282(-0.4626, -0.3924)-0.2922(-0.3310, -0.2524)-0.4310(-0.4653, -0.3953)graph_scatterplot
5DP-Orp0.3388(0.3002, 0.3762)0.2273(0.1861, 0.2676)0.3964(0.3596, 0.4320)graph_scatterplot
6NIST-v11b0.4395(0.4042, 0.4735)0.2999(0.2603, 0.3385)0.4410(0.4058, 0.4750)graph_scatterplot
7ATEC40.3827(0.3454, 0.4188)0.2602(0.2197, 0.2998)0.3768(0.3393, 0.4131)graph_scatterplot
8ATEC10.3817(0.3444, 0.4178)0.2594(0.2189, 0.2990)0.3757(0.3382, 0.4120)graph_scatterplot
9mBLEU0.1985(0.1569, 0.2394)0.1365(0.0941, 0.1784)0.1733(0.1314, 0.2147)graph_scatterplot
10SNR0.4220(0.3861, 0.4567)0.2836(0.2436, 0.3226)0.4340(0.3985, 0.4683)graph_scatterplot
114-GRR0.4441(0.4090, 0.4780)0.3037(0.2642, 0.3422)0.4541(0.4193, 0.4875)graph_scatterplot
12ATEC20.3874(0.3503, 0.4233)0.2635(0.2230, 0.3030)0.3799(0.3425, 0.4160)graph_scatterplot
13SEPIA10.4626(0.4282, 0.4957)0.3169(0.2778, 0.3551)0.4581(0.4235, 0.4914)graph_scatterplot
14ULCopt0.4169(0.3808, 0.4518)0.2778(0.2377, 0.3170)0.4412(0.4059, 0.4751)graph_scatterplot
15mTER-0.1714(-0.2127, -0.1294)-0.1192(-0.1613, -0.0767)-0.0800(-0.1225, -0.0372)graph_scatterplot
16EDPM0.4422(0.4070, 0.4761)0.3020(0.2625, 0.3405)0.4515(0.4167, 0.4851)graph_scatterplot
17BLEU-40.4373(0.4019, 0.4714)0.2989(0.2592, 0.3375)0.4451(0.4100, 0.4789)graph_scatterplot
18METEOR-v0.60.4698(0.4357, 0.5027)0.3228(0.2837, 0.3607)0.4706(0.4365, 0.5034)graph_scatterplot
19RTE-MT0.3973(0.3605, 0.4329)0.2725(0.2323, 0.3118)0.3992(0.3625, 0.4347)graph_scatterplot
20BadgerLite0.3595(0.3215, 0.3963)0.2482(0.2074, 0.2880)0.3653(0.3275, 0.4020)graph_scatterplot
21METEOR-ranking0.4719(0.4378, 0.5046)0.3231(0.2841, 0.3611)0.4783(0.4444, 0.5107)graph_scatterplot
22LET0.4306(0.3950, 0.4650)0.2932(0.2535, 0.3320)0.4349(0.3994, 0.4691)graph_scatterplot
23DP-Or0.4341(0.3985, 0.4683)0.2911(0.2513, 0.3299)0.4881(0.4547, 0.5202)graph_scatterplot
24ATEC30.3897(0.3527, 0.4256)0.2658(0.2254, 0.3053)0.3742(0.3367, 0.4106)graph_scatterplot
25BLEU-v120.4414(0.4061, 0.4753)0.3016(0.2620, 0.3401)0.4486(0.4137, 0.4823)graph_scatterplot
26BEwT-E0.3012(0.2617, 0.3398)0.2045(0.1630, 0.2453)0.2826(0.2426, 0.3217)graph_scatterplot
27RTE0.3668(0.3291, 0.4034)0.2520(0.2113, 0.2918)0.3657(0.3279, 0.4023)graph_scatterplot
28DR-Or0.3804(0.3431, 0.4165)0.2550(0.2144, 0.2948)0.4273(0.3916, 0.4618)graph_scatterplot
29BleuSP0.4468(0.4117, 0.4805)0.3054(0.2659, 0.3438)0.4501(0.4152, 0.4837)graph_scatterplot
30SVM-Rank0.4376(0.4022, 0.4716)0.2987(0.2591, 0.3374)0.4243(0.3884, 0.4589)graph_scatterplot
31BLEU-10.4390(0.4037, 0.4731)0.2991(0.2595, 0.3378)0.4419(0.4066, 0.4758)graph_scatterplot
32Bleu-sbp0.4375(0.4021, 0.4716)0.2989(0.2592, 0.3375)0.4443(0.4092, 0.4781)graph_scatterplot
33invWer-0.4253(-0.4599, -0.3895)-0.2896(-0.3285, -0.2498)-0.4238(-0.4584, -0.3879)graph_scatterplot
34BLEU-v11b0.4356(0.4001, 0.4697)0.2977(0.2581, 0.3364)0.4425(0.4072, 0.4764)graph_scatterplot
35SR-Or0.3967(0.3599, 0.4323)0.2657(0.2254, 0.3052)0.4442(0.4091, 0.4780)graph_scatterplot
36Badger0.3761(0.3386, 0.4124)0.2591(0.2186, 0.2987)0.3844(0.3472, 0.4204)graph_scatterplot
37Meteor-v0.70.4639(0.4295, 0.4969)0.3174(0.2782, 0.3555)0.4675(0.4333, 0.5004)graph_scatterplot
38MaxSim0.4275(0.3917, 0.4620)0.2851(0.2452, 0.3241)0.4725(0.4385, 0.5052)graph_scatterplot
39TERp-0.4494(-0.4830, -0.4144)-0.3068(-0.3452, -0.2674)-0.4529(-0.4864, -0.4181)graph_scatterplot

39 metrics (including 7 baseline metrics)
2083 data points (total number of documents used)

Multiple References Track
RankMetric NameSpearman's RhoKendall's TauPearson's RGraphs
Value95% confidence intervalValue95% confidence intervalValue95% confidence interval
1SEPIA20.3358(0.2866, 0.3833)0.2292(0.1769, 0.2802)0.3408(0.2917, 0.3880)graph_scatterplot
2CDer-0.3540(-0.4007, -0.3054)-0.2404(-0.2910, -0.1884)-0.3753(-0.4211, -0.3275)graph_scatterplot
3ULCh0.0457(-0.0088, 0.0999)0.0352(-0.0193, 0.0895)0.0690(0.0146, 0.1230)graph_scatterplot
4TER-v0.7.25-0.3193(-0.3674, -0.2695)-0.2170(-0.2683, -0.1645)-0.3338(-0.3813, -0.2844)graph_scatterplot
5DP-Orp0.0303(-0.0242, 0.0847)0.0207(-0.0338, 0.0751)0.0119(-0.0426, 0.0664)graph_scatterplot
6NIST-v11b0.3313(0.2819, 0.3789)0.2251(0.1727, 0.2761)0.3451(0.2962, 0.3922)graph_scatterplot
7ATEC40.3109(0.2609, 0.3593)0.2124(0.1598, 0.2639)0.3190(0.2692, 0.3671)graph_scatterplot
8ATEC10.3145(0.2645, 0.3627)0.2145(0.1619, 0.2659)0.3195(0.2698, 0.3676)graph_scatterplot
9SNR0.0888(0.0345, 0.1426)0.0633(0.0088, 0.1174)0.0430(-0.0115, 0.0972)graph_scatterplot
10mBLEU0.2715(0.2203, 0.3212)0.1838(0.1306, 0.2359)0.3006(0.2502, 0.3493)graph_scatterplot
114-GRR0.3202(0.2704, 0.3682)0.2187(0.1662, 0.2699)0.3422(0.2932, 0.3894)graph_scatterplot
12ATEC20.3140(0.2640, 0.3623)0.2145(0.1619, 0.2659)0.3197(0.2699, 0.3678)graph_scatterplot
13SEPIA10.3562(0.3077, 0.4029)0.2441(0.1922, 0.2947)0.3768(0.3290, 0.4226)graph_scatterplot
14ULCopt0.0640(0.0096, 0.1181)0.0475(-0.0070, 0.1017)0.0500(-0.0045, 0.1042)graph_scatterplot
15EDPM0.3392(0.2901, 0.3866)0.2314(0.1792, 0.2823)0.3625(0.3143, 0.4089)graph_scatterplot
16mTER-0.2466(-0.2971, -0.1948)-0.1655(-0.2180, -0.1121)-0.2279(-0.2790, -0.1756)graph_scatterplot
17BLEU-40.3206(0.2709, 0.3686)0.2191(0.1666, 0.2703)0.3450(0.2961, 0.3921)graph_scatterplot
18METEOR-v0.60.3804(0.3328, 0.4261)0.2613(0.2098, 0.3113)0.3858(0.3384, 0.4312)graph_scatterplot
19BadgerLite0.2152(0.1626, 0.2665)0.1480(0.0943, 0.2009)0.1787(0.1255, 0.2310)graph_scatterplot
20METEOR-ranking0.3562(0.3077, 0.4029)0.2435(0.1916, 0.2941)0.3793(0.3317, 0.4250)graph_scatterplot
21LET0.3331(0.2837, 0.3806)0.2276(0.1753, 0.2787)0.3448(0.2959, 0.3919)graph_scatterplot
22DP-Or0.0892(0.0349, 0.1430)0.0641(0.0097, 0.1182)0.1176(0.0635, 0.1710)graph_scatterplot
23ATEC30.3162(0.2664, 0.3644)0.2153(0.1628, 0.2667)0.3151(0.2652, 0.3633)graph_scatterplot
24BLEU-v120.3283(0.2789, 0.3761)0.2241(0.1717, 0.2752)0.3487(0.2999, 0.3957)graph_scatterplot
25BEwT-E0.3411(0.2920, 0.3883)0.2307(0.1785, 0.2816)0.3578(0.3094, 0.4044)graph_scatterplot
26DR-Or0.0730(0.0186, 0.1270)0.0512(-0.0033, 0.1054)0.0651(0.0107, 0.1192)graph_scatterplot
27BleuSP0.3531(0.3045, 0.3999)0.2425(0.1906, 0.2931)0.3770(0.3293, 0.4228)graph_scatterplot
28SVM-Rank0.3857(0.3383, 0.4311)0.2649(0.2135, 0.3148)0.3959(0.3489, 0.4408)graph_scatterplot
29BLEU-10.2968(0.2463, 0.3457)0.2016(0.1487, 0.2532)0.3132(0.2632, 0.3615)graph_scatterplot
30Bleu-sbp0.3248(0.2752, 0.3727)0.2217(0.1693, 0.2729)0.3474(0.2986, 0.3944)graph_scatterplot
31invWer-0.3525(-0.3993, -0.3039)-0.2396(-0.2902, -0.1875)-0.3649(-0.4112, -0.3168)graph_scatterplot
32BLEU-v11b0.3217(0.2720, 0.3697)0.2199(0.1674, 0.2711)0.3431(0.2941, 0.3903)graph_scatterplot
33SR-Or0.0697(0.0153, 0.1237)0.0496(-0.0049, 0.1038)0.0880(0.0337, 0.1418)graph_scatterplot
34Badger0.1912(0.1381, 0.2431)0.1311(0.0772, 0.1843)0.1737(0.1204, 0.2261)graph_scatterplot
35Meteor-v0.70.3584(0.3100, 0.4050)0.2446(0.1926, 0.2951)0.3739(0.3260, 0.4198)graph_scatterplot
36MaxSim0.0629(0.0084, 0.1170)0.0459(-0.0086, 0.1001)0.0939(0.0397, 0.1476)graph_scatterplot
37TERp-0.3294(-0.3771, -0.2800)-0.2243(-0.2754, -0.1719)-0.3375(-0.3849, -0.2883)graph_scatterplot

37 metrics (including 7 baseline metrics)
1295 data points (total number of documents used)