<- Back to leaderboard
Agentic Tool Use (Chat)
Model
Score
95% Confidence
1st
GPT-4o (August 2024)
56.85
+6.92 / -6.92
2nd
Claude 3.5 Sonnet (June 2024)
56.06
+6.91 / -6.91
3rd
o1-preview
55.10
+6.96 / -6.96
4
GPT-4 Turbo Preview
53.03
+6.95 / -6.95
5
Gemini 1.5 Pro (August 27, 2024)
51.27
+6.98 / -6.98
6
GPT-4o (May 2024)
49.50
+6.96 / -6.96
7
Claude 3 Opus
48.49
+6.96 / -6.96
8
Claude 3 Sonnet
40.40
+6.84 / -6.84
9
Mistral Large 2
40.40
+6.84 / -6.84
10
Llama 3.1 405B Instruct
40.10
+6.84 / -6.84
11
GPT-4
37.88
+6.78 / -6.78
12
Gemini 1.5 Pro (May 2024)
35.50
+6.57 / -6.68
13
Llama 3.1 70B Instruct
33.50
+6.59 / -6.59
14
GPT-4o mini
32.83
+6.54 / -6.54
15
Command R+
20.20
+5.59 / -5.59
16
Llama 3.1 8B Instruct
6.09
+3.34 / -3.34