Scale AI logo
SEAL Logo

Humanity's Last Exam (Preview)

Challenging LLMs at the frontier of human knowledge

Last updated: April 10, 2025

Performance Comparison

1

18.81±1.47

2

9.15±1.09

2

8.93±1.08

2

8.81±1.07

2

7.22±0.98

3

7.07±0.97

5

6.78±0.95

5

6.41±0.92

5

Llama 3.2 90B Vision Instruct

5.52±0.86

7

5.22±0.84

7

Gemini 2.0 Flash Experimental (December 2024)

5.19±0.84

7

5.07±0.83

7

5.04±0.83

8

4.78±0.80

9

Qwen2-VL-72B-Instruct

4.67±0.80

9

4.63±0.79

9

4.56±0.79

9

Claude 3 Opus

4.19±0.76

9

Gemini-1.5-Flash-002

4.15±0.75

9

3.96±0.74

18

3.07±0.65