Frontier Leaderboards
2025 Scale AI. All rights reserved.
Korean
Deprecated (as of March 2025)
Last updated: March 27, 2025
Performance Comparison
1
o1-preview
66.43±5.47
2
Claude 3.7 Sonnet (February 2025)
64.93±5.51
3
GPT-4o (May 2024)
64.58±5.52
4
GPT-4.5 Preview (February 2025)
63.76±5.56
5
GPT-4 Turbo Preview
60.76±5.64
6
Gemini 1.5 Pro (August 27, 2024)
60.28±5.66
7
GPT-4o (August 2024)
59.93±5.67
8
Claude 3.5 Sonnet (June 2024)
59.38±5.67
9
Gemini 2.0 Flash
55.60±5.73
10
Claude 3 Sonnet
54.17±5.78
11
Claude 3 Opus
52.78±5.77
12
GPT-4o mini
51.74±5.77
13
GPT-4
51.39±5.77
14
Mistral Large 2
50.35±5.78
15
Llama 3.1 405B Instruct
50.35±5.78
16
Gemini 1.5 Pro (May 2024)
40.42±5.68
17
Llama 3.1 70B Instruct
37.23±5.60
18
Command R+
30.21±5.30
19
Llama 3.1 8B Instruct
17.42±4.39