Frontier Leaderboards
2025 Scale AI. All rights reserved.
Japanese
Deprecated (as of March 2025)
Last updated: March 27, 2025
Performance Comparison
1
o1-preview
1118.00 ±43.00
2
Gemini 2.0 Pro (December 2024)
1117.00 ±36.00
3
o3-mini
1104.00 ±31.00
4
Gemini 1.5 Pro (August 27, 2024)
1093.00 ±44.00
5
o1 (December 2024)
1086.00 ±30.00
6
Gemini 1.5 Pro (November 2024)
1084.00 ±30.00
7
GPT-4o (August 2024)
1077.00 ±42.00
8
Claude 3.5 Sonnet (June 2024)
1069.00 ±36.00
9
Gemini Pro Flash 2
1064.00 ±33.00
10
GPT-4 (November 2024)
978.00 ±27.00
11
Mistral Large 2
967.00 ±41.00
12
Gemini 1.5 Flash
966.00 ±56.00
13
Aya Expanse 32B
960.00 ±32.00
14
Llama 3.1 405B Instruct
895.00 ±83.00
15
Gemma 2 27B
892.00 ±42.00
16
Llama 3.3 70B Instruct
778.00 ±35.00
17
Aya 23 35B*
760.00 ±48.00