<- Back to leaderboard
Agentic Tool Use (Enterprise)
Model
Score
95% Confidence
1st
o1-preview
66.43
+5.47 / -5.47
2nd
GPT-4o (May 2024)
64.58
+5.52 / -5.52
3rd
GPT-4 Turbo Preview
60.76
+5.64 / -5.64
4
Gemini 1.5 Pro (August 27, 2024)
60.28
+5.66 / -5.66
5
GPT-4o (August 2024)
59.93
+5.67 / -5.67
6
Claude 3.5 Sonnet (June 2024)
59.38
+5.67 / -5.67
7
Claude 3 Sonnet
54.17
+5.78 / -5.78
8
Claude 3 Opus
52.78
+5.77 / -5.78
9
GPT-4o mini
51.74
+5.77 / -5.77
10
GPT-4
51.39
+5.77 / -5.77
11
Mistral Large 2
50.35
+5.78 / -5.78
12
Llama 3.1 405B Instruct
50.35
+5.78 / -5.78
13
Gemini 1.5 Pro (May 2024)
40.42
+5.68 / -5.68
14
Llama 3.1 70B Instruct
37.23
+5.60 / -5.60
15
Command R+
30.21
+5.30 / -5.30
16
Llama 3.1 8B Instruct
17.42
+4.39 / -4.39