<- Back to leaderboard
Humanity's Last Exam
Model
Accuracy
95% Confidence
1st
o1 (December 2024)
9.07
+1.03 / -1.03
2nd
Gemini 2.0 Flash Thinking (December 2024)
6.23
+0.86 / -0.86
3rd
Gemini 2.0 Flash Experimental (December 2024)
5.37
+0.81 / -0.81
4
Llama 3.2 90B Vision Instruct
5.17
+0.79 / -0.79
5
Gemini-1.5-Pro-002
4.97
+0.78 / -0.78
6
Pixtral Large (November 2024)
4.37
+0.73 / -0.73
6
Qwen2-VL-72B-Instruct
4.37
+0.73 / -0.73
8
Claude 3.5 Sonnet (October 2024)
4.27
+0.72 / -0.72
9
Molmo 72B
4.18
+0.72 / -0.72
10
Claude 3 Opus
4
+0.70 / -0.70
11
Gemini-1.5-Flash-002
3.77
+0.68 / -0.68
12
GPT-4o (November 2024)
3.30
+0.64 / -0.64