Frontier Leaderboards
2025 Scale AI. All rights reserved.
Humanity's Last Exam Text Only (Preview)
Models evaluated on text-only HLE questions
Last updated: April 10, 2025
Performance Comparison
1
18.57±1.57
2
13.97±1.40
3
11.10±1.26
4
8.61±1.13
4
DeepSeek-R1
8.57±1.13
4
8.44±1.12
4
8.35±1.11
4
7.05±1.03
4
6.67±1.00
4
6.58±1.00
8
6.24±0.97
8
Llama 3.2 90B Vision Instruct
5.53±0.92
8
5.23±0.90
8
5.15±0.89
9
Gemini 2.0 Flash Experimental (December 2024)
4.89±0.87
9
4.89±0.87
9
4.85±0.86
9
4.81±0.86
10
Qwen2-VL-72B-Instruct
4.73±0.85
11
4.60±0.84
12
4.43±0.83
12
o1-mini*
4.05±0.79
12
Claude 3 Opus
3.97±0.79
12
Gemini-1.5-Flash-002
3.84±0.77
22
2.62±0.64