Frontier Leaderboards
2025 Scale AI. All rights reserved.
Adversarial Robustness
Deprecated (as of January 2025)
Last updated: April 3, 2025
Performance Comparison
1
Gemini 1.5 Pro (May 2024)
8.00 ±8.00
2
Llama 3.1 405B Instruct
10.00 ±8.00
3
Claude 3 Opus
13.00 ±9.00
4
Gemini 1.5 Flash
14.00 ±9.00
5
Claude 3.5 Sonnet (June 2024)
16.00 ±10.00
6
GPT-4 Turbo Preview
20.00 ±11.00
7
Mistral Large
37.00 ±14.00
8
GPT-4o (May 2024)
67.00 ±17.00