Frontier Leaderboards
2025 Scale AI. All rights reserved.
Korean
Deprecated (as of March 2025)
Last updated: March 27, 2025
Performance Comparison
1
o1-preview
66.43 ±5.47
2
Claude 3.7 Sonnet (February 2025)
64.93 ±5.51
3
GPT-4o (May 2024)
64.58 ±5.52
4
GPT-4.5 Preview (February 2025)
63.76 ±5.56
5
GPT-4 Turbo Preview
60.76 ±5.64
6
Gemini 1.5 Pro (August 27, 2024)
60.28 ±5.66
7
GPT-4o (August 2024)
59.93 ±5.67
8
Claude 3.5 Sonnet (June 2024)
59.38 ±5.67
9
Gemini 2.0 Flash
55.60 ±5.73
10
Claude 3 Sonnet
54.17 ±5.78
11
Claude 3 Opus
52.78 ±5.77
12
GPT-4o mini
51.74 ±5.77
13
GPT-4
51.39 ±5.77
14
Mistral Large 2
50.35 ±5.78
15
Llama 3.1 405B Instruct
50.35 ±5.78
16
Gemini 1.5 Pro (May 2024)
40.42 ±5.68
17
Llama 3.1 70B Instruct
37.23 ±5.60
18
Command R+
30.21 ±5.30
19
Llama 3.1 8B Instruct
17.42 ±4.39