Frontier Leaderboards
Legacy Leaderboards
2025 Scale AI. All rights reserved.
Humanity's Last Exam
Challenging LLMs at the frontier of human knowledge
Last updated: April 30, 2025
Performance Comparison
1
20.32±1.58Calib Err: 34
1
19.20±1.54Calib Err: 39
1
18.16±1.51Calib Err: 71
1
18.08±1.51Calib Err: 57
5
14.28±1.37Calib Err: 59
5
12.08±1.28Calib Err: 80
7
8.12±1.07Calib Err: 82
7
8.04±1.07Calib Err: 80
7
7.96±1.06Calib Err: 83
7
6.56±0.97Calib Err: 82
7
5.68±0.91Calib Err: 83
10
5.44±0.89Calib Err: 85
10
5.40±0.89Calib Err: 89
11
4.60±0.82Calib Err: 88
12
4.40±0.80Calib Err: 80
12
4.08±0.78Calib Err: 84
14
3.64±0.73Calib Err: 82
16
2.72±0.64Calib Err: 89