Frontier Leaderboards
2025 Scale AI. All rights reserved.
Humanity's Last Exam Text Only (Preview)
Models evaluated on text-only HLE questions
Last updated: April 10, 2025
Performance Comparison
1
18.57 ±1.57
2
13.97 ±1.40
3
11.10 ±1.26
4
8.61 ±1.13
4
DeepSeek-R1
8.57 ±1.13
4
8.44 ±1.12
4
8.35 ±1.11
4
7.05 ±1.03
4
6.67 ±1.00
4
6.58 ±1.00
8
6.24 ±0.97
8
Llama 3.2 90B Vision Instruct
5.53 ±0.92
8
5.23 ±0.90
8
5.15 ±0.89
9
Gemini 2.0 Flash Experimental (December 2024)
4.89 ±0.87
9
4.89 ±0.87
9
4.85 ±0.86
9
4.81 ±0.86
10
Qwen2-VL-72B-Instruct
4.73 ±0.85
11
4.60 ±0.84
12
4.43 ±0.83
12
o1-mini*
4.05 ±0.79
12
Claude 3 Opus
3.97 ±0.79
12
Gemini-1.5-Flash-002
3.84 ±0.77
22
2.62 ±0.64