Frontier Leaderboards
2025 Scale AI. All rights reserved.
Math
Deprecated (as of January 2025)
Last updated: March 20, 2025
Performance Comparison
1
Claude 3.5 Sonnet (June 2024)
96.60±1.02
2
GPT-4o (August 2024)
95.68±1.15
3
Llama 3.1 405B Instruct
95.60±1.16
4
Claude 3 Opus
95.19±1.21
5
GPT-4 Turbo Preview
95.10±1.22
6
GPT-4o (May 2024)
94.85±1.25
7
Gemini 1.5 Pro (August 2024)
94.69±1.27
8
Mistral Large 2
93.94±1.35
9
Claude 3 Sonnet
93.28±1.41
10
Gemini 1.5 Pro (May 2024)
92.28±1.51
11
Gemini 1.5 Pro (April 2024)
90.54±1.65
12
Llama 3 70B Instruct
90.12±1.69
12
Gemini 1.5 Flash
90.12±1.69
14
Mistral Large
87.47±1.87
15
Gemini 1.0 Pro
79.83±2.27
16
CodeLlama 34B Instruct
37.51±2.73