Scale Logo
SEAL Logo

Humanity's Last Exam (Text Only)

Models evaluated on text-only HLE questions

Last updated: April 1, 2025

Performance Comparison

1

18.57 ±1.57

2

13.97 ±1.40

3

11.1 ±1.26

4

8.61 ±1.13

4

DeepSeek-R1

8.57 ±1.13

4

8.44 ±1.12

4

8.35 ±1.11

4

7.05 ±1.03

4

6.67 ±1.00

4

6.58 ±1.00

8

Llama 3.2 90B Vision Instruct

5.53 ±0.92

8

5.23 ±0.90

8

5.15 ±0.89

9

Gemini 2.0 Flash Experimental (December 2024)

4.89 ±0.87

9

4.89 ±0.87

9

4.85 ±0.86

9

4.81 ±0.86

10

Qwen2-VL-72B-Instruct

4.73 ±0.85

11

4.6 ±0.84

11

4.43 ±0.83

11

o1-mini*

4.05 ±0.79

11

Claude 3 Opus

3.97 ±0.79

11

Gemini-1.5-Flash-002

3.84 ±0.77

21

2.62 ±0.64