Scale AI logo
SEAL Logo

Humanity's Last Exam

Challenging LLMs at the frontier of human knowledge

Last updated: April 30, 2025

Performance Comparison

1

20.32±1.58Calib Err: 34

1

19.20±1.54Calib Err: 39

1

18.16±1.51Calib Err: 71

1

18.08±1.51Calib Err: 57

5

14.28±1.37Calib Err: 59

5

12.08±1.28Calib Err: 80

7

8.12±1.07Calib Err: 82

7

8.04±1.07Calib Err: 80

7

7.96±1.06Calib Err: 83

7

6.56±0.97Calib Err: 82

7

5.68±0.91Calib Err: 83

10

5.44±0.89Calib Err: 85

10

5.40±0.89Calib Err: 89

11

4.60±0.82Calib Err: 88

12

4.40±0.80Calib Err: 80

12

4.08±0.78Calib Err: 84

14

3.64±0.73Calib Err: 82

16

2.72±0.64Calib Err: 89