Models evaluated on text-only HLE questions
18.57
+1.57 / -1.57
13.97
+1.40 / -1.40
11.10
+1.26 / -1.26
8.61
+1.13 / -1.13
8.57
8.44
+1.12 / -1.12
8.35
+1.11 / -1.11
7.05
+1.03 / -1.03
6.67
+1.00 / -1.00
6.58
5.53
+0.92 / -0.92
5.23
+0.90 / -0.90
5.15
+0.89 / -0.89
4.89
+0.87 / -0.87
4.85
+0.86 / -0.86
4.81
4.73
+0.85 / -0.85
4.60
+0.84 / -0.84
4.43
+0.83 / -0.83
4.05
+0.79 / -0.79
3.97
3.84
+0.77 / -0.77
2.62
+0.64 / -0.64
Rank (UB): 1 + the number of models whose lower CI bound exceeds this model’s upper CI bound.
-