Scale Logo
SEAL Logo

Humanity's Last Exam Text Only (Preview)

Models evaluated on text-only HLE questions

Last updated: April 10, 2025

Performance Comparison

1

18.57±1.57

2

13.97±1.40

3

11.10±1.26

4

8.61±1.13

4

DeepSeek-R1

8.57±1.13

4

8.44±1.12

4

8.35±1.11

4

7.05±1.03

4

6.67±1.00

4

6.58±1.00

8

6.24±0.97

8

Llama 3.2 90B Vision Instruct

5.53±0.92

8

5.23±0.90

8

5.15±0.89

9

Gemini 2.0 Flash Experimental (December 2024)

4.89±0.87

9

4.89±0.87

9

4.85±0.86

9

4.81±0.86

10

Qwen2-VL-72B-Instruct

4.73±0.85

11

4.60±0.84

12

4.43±0.83

12

o1-mini*

4.05±0.79

12

Claude 3 Opus

3.97±0.79

12

Gemini-1.5-Flash-002

3.84±0.77

22

2.62±0.64