<- Back to leaderboard
Instruction Following
Model
Score
95% Confidence
1st
Gemini Pro Flash 2
87.09
+1.91 / -1.92
2nd
o1 Preview
86.41
+1.66 / -1.66
3rd
Claude 3.5 Sonnet (June 2024)
86.13
+1.44 / -1.43
4
Gemini 1.5 Pro (August 27, 2024)
85.28
+1.65 / -1.66
5
GPT-4o (May 2024)
85.28
+1.47 / -1.47
6
Llama 3.1 405B Instruct
85.16
+1.45 / -1.44
7
Mistral Large 2
83.40
+1.73 / -1.72
8
GPT-4 Turbo Preview
83.30
+1.36 / -1.35
9
Llama 3.2 90B Vision Instruct
82.42
+1.75 / -1.76
10
GPT-4o (November 2024)
82.25
+2.21 / -2.21
11
Deepseek V3
82.08
+2.23 / -2.24
12
Llama 3 70B Instruct
81.40
+1.80 / -1.80
13
GPT-4o (August 2024)
80.17
+1.70 / -1.70
14
Claude 3 Opus
80.12
+1.54 / -1.55
15
GPT-4 (November 2024)
80
+2.08 / -2.08
16
Mistral Large
79.89
+1.67 / -1.66
17
Gemini 1.5 Pro (May 2024)
79.50
+1.77 / -1.77
18
Gemini 1.5 Pro (April 2024)
78.52
+2.33 / -2.32
19
Claude 3 Sonnet
78.24
+2.19 / -2.19
20
Gemini 1.5 Flash
77.25
+1.96 / -1.97
21
Gemini 1.0 Pro
67.97
+2.61 / -2.62
22
CodeLlama 34B Instruct
57.69
+2.58 / -2.57