Scale AI logo
SEAL Logo

MASK

Last updated: April 10, 2025

Performance Comparison

1

84.50±2.30

1

82.60±2.80

1

Claude 3.7 Sonnet (Thinking) (February 2025)

82.13±1.25

3

Claude 3 Opus

79.00±1.31

3

78.60±2.30

6

72.90±2.30

6

Claude 3.5 Sonnet (October 2024)

72.33±2.45

6

Claude 3.7 Sonnet (February 2025)

72.27±3.31

9

o1-Pro

61.60±0.86

9

Llama 3.1 405B Instruct

61.40±1.99

9

61.40±1.80

9

gpt 4o (November 2024)

60.07±2.07

9

GPT 4.5 Preview

56.93±4.02

9

56.40±4.98

10

o1 (December 2024)

59.27±1.25

10

Deepseek R1

57.32±2.58

11

Gemini 2.5 Pro Experimental (March 2025)

55.93±3.49

14

Llama 3.2 90B Vision Instruct

54.07±2.24

14

53.10±4.50

14

Llama 3.3 70B Instruct

51.93±4.98

15

o3 mini (Low)

49.73±3.23

17

51.13±1.03

17

50.00±2.20

19

Llama 4 Maverick

49.73±1.60

19

Gemini 2.0 Flash Thinking (January 2025)

49.53±0.76

19

Gemini 2.0 Flash

49.07±2.01

19

o3 mini (Medium)

48.93±1.25

19

Gemini 2.0 Pro Experimental (February 2025)

48.67±2.29

20

Mistral Large 2411

47.53±1.74

20

o3 mini (High)

46.80±2.58

29

Deepseek V3 (March 2025)

44.53±1.74