<- Back to leaderboard
EnigmaEval
Model
Accuracy (%)
Std. Deviation
1st
o1 (December 2024)
5.65
+0.46 / -0.46
2nd
Gemini 2.0 Flash Thinking (January 2025)
1.10
+0.15 / -0.15
3rd
Claude 3.5 Sonnet (October 2024)
0.91
+0.14 / -0.14
4
Pixtral Large (November 2024)
0.84
+0.17 / -0.17
5
Claude 3 Opus
0.82
+0.04 / -0.04
6
GPT-4o (November 2024)
0.80
+0.11 / -0.11
7
Gemini 2.0 Pro Experimental (February 2025)
0.69
+0.37 / -0.37
8
Gemini 2.0 Flash (February 2025)
0.63
+0.21 / -0.21
9
Llama 3.2 90B Vision Instruct
0.38
+0.05 / -0.05