Go Back

Correlation Results

Current Conditions

  • Human Assessment Type: Preferences, Pair-wise comparison across systems
  • Target Language: English
  • Correlation Level: segment

Subdivisions

By track:

Ranking

Single Reference Track
RankMetric NameSpearman's RhoKendall's TauPearson's RGraphs
Value95% confidence intervalValue95% confidence intervalValue95% confidence interval
1SEPIA20.3359(0.3165, 0.3549)0.2397(0.2192, 0.2600)0.3058(0.2861, 0.3253)graph_scatterplot
2CDer-0.3414(-0.3604, -0.3222)-0.2430(-0.2632, -0.2225)-0.3162(-0.3356, -0.2966)graph_scatterplot
3ULCh0.3205(0.3010, 0.3398)0.2201(0.1994, 0.2406)0.3537(0.3346, 0.3724)graph_scatterplot
4TER-v0.7.25-0.2994(-0.3190, -0.2796)-0.2133(-0.2339, -0.1925)-0.2530(-0.2732, -0.2327)graph_scatterplot
5DP-Orp0.2245(0.2038, 0.2450)0.1556(0.1344, 0.1766)0.2497(0.2293, 0.2699)graph_scatterplot
6NIST-v11b0.3318(0.3124, 0.3509)0.2364(0.2159, 0.2568)0.3218(0.3023, 0.3411)graph_scatterplot
7ATEC40.3272(0.3077, 0.3464)0.2329(0.2123, 0.2532)0.3045(0.2848, 0.3240)graph_scatterplot
8ATEC10.3270(0.3076, 0.3462)0.2327(0.2121, 0.2530)0.3049(0.2851, 0.3244)graph_scatterplot
9mBLEU0.1067(0.0853, 0.1281)0.0755(0.0540, 0.0970)0.1126(0.0911, 0.1339)graph_scatterplot
10SNR0.3002(0.2804, 0.3198)0.2058(0.1850, 0.2264)0.3341(0.3147, 0.3532)graph_scatterplot
114-GRR0.3107(0.2911, 0.3302)0.2205(0.1998, 0.2410)0.2719(0.2518, 0.2919)graph_scatterplot
12ATEC20.3285(0.3091, 0.3477)0.2339(0.2134, 0.2543)0.3049(0.2851, 0.3244)graph_scatterplot
13SEPIA10.3370(0.3177, 0.3561)0.2400(0.2195, 0.2603)0.3019(0.2820, 0.3214)graph_scatterplot
14ULCopt0.3127(0.2930, 0.3321)0.2126(0.1918, 0.2332)0.3435(0.3242, 0.3624)graph_scatterplot
15mTER-0.0935(-0.1149, -0.0720)-0.0672(-0.0888, -0.0457)-0.0864(-0.1078, -0.0649)graph_scatterplot
16EDPM0.3256(0.3061, 0.3448)0.2312(0.2106, 0.2516)0.3102(0.2905, 0.3296)graph_scatterplot
17BLEU-40.2878(0.2678, 0.3075)0.2041(0.1833, 0.2248)0.2567(0.2363, 0.2768)graph_scatterplot
18METEOR-v0.60.3543(0.3352, 0.3731)0.2520(0.2316, 0.2721)0.3373(0.3180, 0.3563)graph_scatterplot
19RTE-MT0.3022(0.2824, 0.3217)0.2136(0.1929, 0.2342)0.2956(0.2757, 0.3152)graph_scatterplot
20BadgerLite0.2209(0.2002, 0.2413)0.1560(0.1348, 0.1770)0.1944(0.1734, 0.2151)graph_scatterplot
21METEOR-ranking0.3585(0.3394, 0.3772)0.2550(0.2346, 0.2751)0.3240(0.3045, 0.3432)graph_scatterplot
22LET0.3356(0.3162, 0.3547)0.2389(0.2184, 0.2592)0.3218(0.3023, 0.3411)graph_scatterplot
23DP-Or0.3064(0.2867, 0.3259)0.2120(0.1912, 0.2326)0.3217(0.3022, 0.3410)graph_scatterplot
24ATEC30.3263(0.3068, 0.3455)0.2324(0.2119, 0.2528)0.3017(0.2819, 0.3212)graph_scatterplot
25BLEU-v120.2413(0.2208, 0.2616)0.1841(0.1631, 0.2049)0.2597(0.2394, 0.2798)graph_scatterplot
26BEwT-E0.2112(0.1904, 0.2318)0.1526(0.1314, 0.1737)0.2053(0.1845, 0.2259)graph_scatterplot
27RTE0.2771(0.2570, 0.2970)0.1957(0.1748, 0.2165)0.2660(0.2458, 0.2860)graph_scatterplot
28DR-Or0.2712(0.2510, 0.2912)0.1883(0.1673, 0.2091)0.2860(0.2660, 0.3057)graph_scatterplot
29BleuSP0.3208(0.3013, 0.3401)0.2277(0.2071, 0.2481)0.2929(0.2730, 0.3126)graph_scatterplot
30SVM-Rank0.3195(0.2999, 0.3388)0.2275(0.2069, 0.2480)0.3034(0.2836, 0.3229)graph_scatterplot
31BLEU-10.3208(0.3012, 0.3401)0.2287(0.2081, 0.2491)0.3139(0.2943, 0.3333)graph_scatterplot
32Bleu-sbp0.2334(0.2128, 0.2537)0.1786(0.1575, 0.1994)0.2514(0.2311, 0.2716)graph_scatterplot
33invWer-0.3042(-0.3237, -0.2845)-0.2168(-0.2374, -0.1961)-0.2544(-0.2746, -0.2341)graph_scatterplot
34BLEU-v11b0.2334(0.2128, 0.2537)0.1786(0.1575, 0.1994)0.2514(0.2311, 0.2716)graph_scatterplot
35SR-Or0.2863(0.2663, 0.3060)0.2041(0.1832, 0.2247)0.2817(0.2617, 0.3015)graph_scatterplot
36Badger0.2170(0.1963, 0.2376)0.1535(0.1323, 0.1746)0.2006(0.1797, 0.2213)graph_scatterplot
37Meteor-v0.70.3551(0.3361, 0.3739)0.2526(0.2322, 0.2727)0.3409(0.3216, 0.3599)graph_scatterplot
38MaxSim0.3285(0.3091, 0.3477)0.2267(0.2061, 0.2471)0.3583(0.3393, 0.3770)graph_scatterplot
39TERp-0.3597(-0.3784, -0.3407)-0.2569(-0.2770, -0.2366)-0.3403(-0.3593, -0.3210)graph_scatterplot

39 metrics (including 7 baseline metrics)
8198 data points (total number of segments used)

Multiple References Track
RankMetric NameSpearman's RhoKendall's TauPearson's RGraphs
Value95% confidence intervalValue95% confidence intervalValue95% confidence interval
1SEPIA20.3853(0.3616, 0.4084)0.2775(0.2519, 0.3027)0.3372(0.3126, 0.3614)graph_scatterplot
2CDer-0.3830(-0.4062, -0.3592)-0.2756(-0.3008, -0.2500)-0.3448(-0.3688, -0.3203)graph_scatterplot
3ULCh0.1043(0.0771, 0.1315)0.0754(0.0480, 0.1027)0.1589(0.1319, 0.1856)graph_scatterplot
4TER-v0.7.25-0.3002(-0.3250, -0.2749)-0.2150(-0.2411, -0.1886)-0.2373(-0.2631, -0.2112)graph_scatterplot
5DP-Orp0.0677(0.0403, 0.0950)0.0482(0.0207, 0.0756)0.0826(0.0552, 0.1099)graph_scatterplot
6NIST-v11b0.3582(0.3340, 0.3819)0.2576(0.2318, 0.2831)0.3350(0.3103, 0.3592)graph_scatterplot
7ATEC40.3925(0.3690, 0.4155)0.2825(0.2570, 0.3076)0.3623(0.3382, 0.3860)graph_scatterplot
8ATEC10.3906(0.3670, 0.4136)0.2816(0.2561, 0.3068)0.3602(0.3360, 0.3839)graph_scatterplot
9SNR0.0821(0.0547, 0.1094)0.0600(0.0325, 0.0873)0.0878(0.0604, 0.1150)graph_scatterplot
10mBLEU0.2112(0.1847, 0.2373)0.1502(0.1232, 0.1770)0.2006(0.1741, 0.2269)graph_scatterplot
114-GRR0.3093(0.2842, 0.3340)0.2207(0.1944, 0.2468)0.2495(0.2235, 0.2751)graph_scatterplot
12ATEC20.3923(0.3688, 0.4153)0.2827(0.2572, 0.3078)0.3597(0.3355, 0.3834)graph_scatterplot
13SEPIA10.3785(0.3547, 0.4018)0.2723(0.2467, 0.2976)0.3581(0.3339, 0.3819)graph_scatterplot
14ULCopt0.0918(0.0644, 0.1190)0.0671(0.0397, 0.0944)0.1301(0.1030, 0.1570)graph_scatterplot
15EDPM0.3639(0.3398, 0.3875)0.2614(0.2356, 0.2868)0.3422(0.3177, 0.3662)graph_scatterplot
16mTER-0.1744(-0.2009, -0.1476)-0.1247(-0.1517, -0.0975)-0.1469(-0.1737, -0.1199)graph_scatterplot
17BLEU-40.3082(0.2831, 0.3329)0.2210(0.1947, 0.2470)0.2943(0.2690, 0.3192)graph_scatterplot
18METEOR-v0.60.3885(0.3649, 0.4116)0.2793(0.2538, 0.3045)0.3680(0.3439, 0.3915)graph_scatterplot
19BadgerLite0.1797(0.1530, 0.2062)0.1283(0.1011, 0.1552)0.1283(0.1012, 0.1553)graph_scatterplot
20METEOR-ranking0.3930(0.3695, 0.4160)0.2827(0.2572, 0.3079)0.3573(0.3331, 0.3811)graph_scatterplot
21LET0.4092(0.3860, 0.4318)0.2953(0.2700, 0.3202)0.3814(0.3577, 0.4047)graph_scatterplot
22DP-Or0.1240(0.0968, 0.1509)0.0897(0.0624, 0.1169)0.1674(0.1406, 0.1940)graph_scatterplot
23ATEC30.3936(0.3701, 0.4166)0.2840(0.2585, 0.3090)0.3583(0.3341, 0.3820)graph_scatterplot
24BLEU-v120.2617(0.2359, 0.2871)0.1992(0.1727, 0.2255)0.2635(0.2377, 0.2889)graph_scatterplot
25BEwT-E0.3012(0.2760, 0.3260)0.2174(0.1911, 0.2435)0.2947(0.2693, 0.3196)graph_scatterplot
26DR-Or0.0852(0.0579, 0.1125)0.0618(0.0344, 0.0892)0.1417(0.1147, 0.1686)graph_scatterplot
27BleuSP0.3723(0.3484, 0.3958)0.2681(0.2424, 0.2934)0.3600(0.3358, 0.3837)graph_scatterplot
28SVM-Rank0.4043(0.3810, 0.4271)0.2911(0.2658, 0.3161)0.3849(0.3612, 0.4081)graph_scatterplot
29BLEU-10.3403(0.3158, 0.3644)0.2450(0.2190, 0.2707)0.3116(0.2866, 0.3362)graph_scatterplot
30Bleu-sbp0.2472(0.2212, 0.2729)0.1885(0.1619, 0.2149)0.2492(0.2232, 0.2748)graph_scatterplot
31invWer-0.3531(-0.3769, -0.3288)-0.2545(-0.2801, -0.2286)-0.2804(-0.3056, -0.2549)graph_scatterplot
32BLEU-v11b0.2472(0.2212, 0.2729)0.1885(0.1619, 0.2149)0.2492(0.2232, 0.2748)graph_scatterplot
33SR-Or0.1145(0.0873, 0.1415)0.0837(0.0563, 0.1110)0.1522(0.1252, 0.1790)graph_scatterplot
34Badger0.1689(0.1421, 0.1955)0.1201(0.0929, 0.1471)0.1398(0.1127, 0.1666)graph_scatterplot
35Meteor-v0.70.3982(0.3748, 0.4210)0.2867(0.2613, 0.3118)0.3733(0.3494, 0.3967)graph_scatterplot
36MaxSim0.1247(0.0975, 0.1517)0.0904(0.0630, 0.1176)0.1504(0.1234, 0.1772)graph_scatterplot
37TERp-0.3883(-0.4114, -0.3647)-0.2788(-0.3039, -0.2532)-0.3707(-0.3942, -0.3468)graph_scatterplot

37 metrics (including 7 baseline metrics)
5080 data points (total number of segments used)