Scale AI logo
SEAL Logo

Humanity's Last Exam

Challenging LLMs at the frontier of human knowledge

Last updated: April 30, 2025

Performance Comparison

1

20.32±1.58Calib Err: 34

1

19.20±1.54Calib Err: 39

1

18.16±1.51Calib Err: 71

1

18.08±1.51Calib Err: 57

1

17.80±1.50Calib Err: 70

6

14.28±1.37Calib Err: 59

6

12.08±1.28Calib Err: 80

7

10.96±1.22Calib Err: 82

7

10.72±1.21Calib Err: 73

10

8.12±1.07Calib Err: 82

10

8.04±1.07Calib Err: 80

10

7.96±1.06Calib Err: 83

10

7.76±1.05Calib Err: 75

10

6.68±0.98Calib Err: 74

10

6.56±0.97Calib Err: 82

10

5.68±0.91Calib Err: 83

14

5.52±0.90Calib Err: 76

14

5.44±0.89Calib Err: 85

14

5.40±0.89Calib Err: 89

16

4.60±0.82Calib Err: 88

16

4.52±0.81Calib Err: 77

17

4.40±0.80Calib Err: 80

17

4.08±0.78Calib Err: 84

20

3.64±0.73Calib Err: 82

23

2.72±0.64Calib Err: 89