Scale Logo
SEAL Logo

EnigmaEval

Puzzle Solving

Last updated: April 3, 2025

Performance Comparison

1

6.14 ±1.02

1

o1 (December 2024)

5.65 ±0.52

3

4.23 ±0.45

3

4.14 ±0.25

5

3.18 ±0.28

6

2.26 ±0.63

6

2.17 ±0.48

8

Gemini 2.0 Flash Thinking (January 2025)

1.10 ±0.17

8

Claude 3.5 Sonnet (October 2024)

0.91 ±0.16

8

Pixtral Large (November 2024)

0.84 ±0.19

8

0.69 ±0.42

9

Claude 3 Opus

0.82 ±0.05

9

GPT-4o (November 2024)

0.80 ±0.12

9

Gemini 2.0 Flash (February 2025)

0.63 ±0.24

10

0.58 ±0.12

13

Llama 3.2 90B Vision Instruct

0.38 ±0.06