<- Back to leaderboard
Humanity's Last Exam
Model
Accuracy
95% Confidence
Calib. Error
1st
o1 (December 2024)
8.81
+1.07 / -1.07
92.79
2nd
Gemini 2.0 Flash Thinking (January 2025)
7.22
+0.98 / -0.98
90.58
3rd
Gemini 2.0 Pro Experimental (February 2025)
7.07
+0.97 / -0.97
92.98
4
Llama 3.2 90B Vision Instruct
5.52
+0.86 / -0.86
88.61
5
Gemini-1.5-Pro-002
5.22
+0.84 / -0.84
93.04
6
Gemini 2.0 Flash Experimental (December 2024)
5.19
+0.84 / -0.84
95.08
7
Gemini 2.0 Flash
5.07
+0.83 / -0.83
90.81
8
Claude 3.5 Sonnet (October 2024)
4.78
+0.80 / -0.80
88.53
9
Qwen2-VL-72B-Instruct
4.67
+0.80 / -0.80
86.48
10
Gemini 2.0 Flash-Lite (February 2025)
4.56
+0.79 / -0.79
89.40
11
Claude 3 Opus
4.19
+0.76 / -0.76
85.06
12
Gemini-1.5-Flash-002
4.15
+0.75 / -0.75
88.66
13
GPT-4o (November 2024)
3.07
+0.65 / -0.65
92.27