Frontier Leaderboards
Legacy Leaderboards
2025 Scale AI. All rights reserved.
Humanity's Last Exam
Challenging LLMs at the frontier of human knowledge
Last updated: April 30, 2025
Performance Comparison
1
20.32±1.58Calib Err: 34
1
19.20±1.54Calib Err: 39
1
18.16±1.51Calib Err: 71
1
18.08±1.51Calib Err: 57
1
17.80±1.50Calib Err: 70
6
14.28±1.37Calib Err: 59
6
12.08±1.28Calib Err: 80
7
10.96±1.22Calib Err: 82
7
10.72±1.21Calib Err: 73
10
8.12±1.07Calib Err: 82
10
8.04±1.07Calib Err: 80
10
7.96±1.06Calib Err: 83
10
7.76±1.05Calib Err: 75
10
6.68±0.98Calib Err: 74
10
6.56±0.97Calib Err: 82
10
5.68±0.91Calib Err: 83
14
5.52±0.90Calib Err: 76
14
5.44±0.89Calib Err: 85
14
5.40±0.89Calib Err: 89
16
4.60±0.82Calib Err: 88
16
4.52±0.81Calib Err: 77
17
4.40±0.80Calib Err: 80
17
4.08±0.78Calib Err: 84
20
3.64±0.73Calib Err: 82
23
2.72±0.64Calib Err: 89