Scale AI logo
SEAL Logo

Instruction Following

Deprecated (as of January 2025)

Last updated: March 20, 2025

Performance Comparison

1

o1 (December 2024)

91.96±1.60

2

DeepSeek R1

87.75±1.91

3

o1-preview

86.58±1.58

4

Gemini 2.0 Flash Experimental (December 2024)

86.58±1.83

5

Claude 3.5 Sonnet (June 2024)

85.96±1.39

6

GPT-4o (May 2024)

85.29±1.42

7

Llama 3.1 405B Instruct

84.85±1.40

8

Gemini 1.5 Pro (August 27, 2024)

84.17±1.65

9

GPT-4 Turbo Preview

83.19±1.31

10

Mistral Large 2

82.81±1.66

11

GPT-4o (November 2024)

82.52±2.10

12

Deepseek V3

82.34±2.08

13

Llama 3.2 90B Vision Instruct

82.07±1.74

14

Llama 3 70B Instruct

81.17±1.77

15

GPT-4o (August 2024)

80.17±1.70

16

Claude 3 Opus

80.12±1.54

17

Mistral Large

79.89±1.67

18

GPT-4 (November 2024)

79.50±1.92

19

Gemini 1.5 Pro (May 2024)

79.37±1.70