Frontier Leaderboards
2025 Scale AI. All rights reserved.
Chinese
Deprecated (as of March 2025)
Last updated: March 20, 2025
Performance Comparison
1
o1 (December 2024)
1165.00±30.00
2
o3-mini
1156.00±32.00
3
o1-preview
1120.00±44.00
4
Gemini 1.5 Pro (August 27, 2024)
1120.00±35.00
5
Gemini 2.0 Pro (December 2024)
1117.00±33.00
6
Gemini Pro Flash 2
1115.00±28.00
7
Gemini 1.5 Pro (November 2024)
1077.00±28.00
8
Gemini 2.0 Flash Thinking (January 2025)
1060.00±33.00
9
DeepSeek R1
1052.00±32.00
10
Deepseek V3
1031.00±26.00
11
GPT-4o (August 2024)
1029.00±30.00
12
Gemini 1.5 Flash
1015.00±53.00
12
Aya Expanse 32B
967.00±29.00
13
Mistral Large 2
1006.00±34.00
14
DeepSeek V2 Chat
996.00±24.00
15
GPT-4 (November 2024)
985.00±28.00
17
Gemma 2 27B
966.00±29.00
18
Claude 3.5 Sonnet (June 2024)
930.00±42.00
19
Qwen 2 72B Instruct
902.00±37.00
20
Llama 3.3 70B Instruct
883.00±33.00
21
Yi 1.5 34B Chat
780.00±41.00