Frontier Leaderboards
2025 Scale AI. All rights reserved.
Chinese
Deprecated (as of March 2025)
Last updated: March 20, 2025
Performance Comparison
1
o1 (December 2024)
1165.00 ±30.00
2
o3-mini
1156.00 ±32.00
3
o1-preview
1120.00 ±44.00
4
Gemini 1.5 Pro (August 27, 2024)
1120.00 ±35.00
5
Gemini 2.0 Pro (December 2024)
1117.00 ±33.00
6
Gemini Pro Flash 2
1115.00 ±28.00
7
Gemini 1.5 Pro (November 2024)
1077.00 ±28.00
8
Gemini 2.0 Flash Thinking (January 2025)
1060.00 ±33.00
9
DeepSeek R1
1052.00 ±32.00
10
Deepseek V3
1031.00 ±26.00
11
GPT-4o (August 2024)
1029.00 ±30.00
12
Gemini 1.5 Flash
1015.00 ±53.00
12
Aya Expanse 32B
967.00 ±29.00
13
Mistral Large 2
1006.00 ±34.00
14
DeepSeek V2 Chat
996.00 ±24.00
15
GPT-4 (November 2024)
985.00 ±28.00
17
Gemma 2 27B
966.00 ±29.00
18
Claude 3.5 Sonnet (June 2024)
930.00 ±42.00
19
Qwen 2 72B Instruct
902.00 ±37.00
20
Llama 3.3 70B Instruct
883.00 ±33.00
21
Yi 1.5 34B Chat
780.00 ±41.00