Scale Logo
SEAL Logo

MASK

Last updated: April 10, 2025

Performance Comparison

1

82.60±2.80

1

Claude 3.7 Sonnet (Thinking) (February 2025)

82.13±1.25

2

Claude 3 Opus

79.00±1.31

4

72.90±2.30

4

Claude 3.5 Sonnet (October 2024)

72.33±2.45

4

Claude 3.7 Sonnet (February 2025)

72.27±3.31

7

o1-Pro

61.60±0.86

7

Llama 3.1 405B Instruct

61.40±1.99

7

61.40±1.80

7

gpt 4o (November 2024)

60.07±2.07

7

GPT 4.5 Preview

56.93±4.02

8

o1 (December 2024)

59.27±1.25

8

Deepseek R1

57.32±2.58

9

Gemini 2.5 Pro Experimental (March 2025)

55.93±3.49

12

Llama 3.2 90B Vision Instruct

54.07±2.24

12

Llama 3.3 70B Instruct

51.93±4.98

13

o3 mini (Low)

49.73±3.23

15

51.13±1.03

15

50.00±2.20

16

Llama 4 Maverick

49.73±1.60

16

Gemini 2.0 Flash Thinking (January 2025)

49.53±0.76

16

Gemini 2.0 Flash

49.07±2.01

16

o3 mini (Medium)

48.93±1.25

16

Gemini 2.0 Pro Experimental (February 2025)

48.67±2.29

17

Mistral Large 2411

47.53±1.74

17

o3 mini (High)

46.80±2.58

25

Deepseek V3 (March 2025)

44.53±1.74