About Us
Playground
<- Back to leaderboard
Adversarial Robustness
Model
N. Violations
95% Confidence
1st
Gemini 1.5 Pro (May 2024)
8
+8 / -4
2nd
Llama 3.1 405B Instruct
10
+8 / -5
3rd
Claude 3 Opus
13
+9 / -5
4
Gemini 1.5 Flash
14
+9 / -6
5
Claude 3.5 Sonnet (June 2024)
16
+10 / -6
6
GPT-4 Turbo Preview
20
+11 / -7
7
Mistral Large
37
+14 / -10
8
GPT-4o (May 2024)
67
+17 / -14