Model
Score
95% CI
1
Gemini 2.0 Pro (December 2024)
1176
+38 / -35
2
o1 (December 2024)
1134
+36 / -35
3
Gemini Pro Flash 2
1119
+32 / -30
4
o1-preview
1111
+27 / -27
5
Gemini 2.0 Flash Thinking (January 2025)
1108
+34 / -32
6
Gemini 1.5 Pro (November 2024)
1105
+30 / -32
7
GPT-4o (May 2024)
1084
+24 / -21
8
o3-mini
1079
+33 / -31
9
Gemini 1.5 Pro (May 2024)
1069
+26 / -25
10
Gemini 1.5 Pro (August 27, 2024)
1067
+23 / -24
11
GPT-4o (August 2024)
1067
+26 / -27
12
GPT-4 (November 2024)
1034
+31 / -32
13
Mistral Large 2
1032
+24 / -25
14
GPT-4 Turbo Preview
1020
+22 / -20
15
Gemini 1.5 Pro (April 2024)
1005
+33 / -32
16
Claude 3.5 Sonnet (June 2024)
992
+25 / -24
17
Aya Expanse 32B
983
+30 / -32
18
Gemini 1.5 Flash
980
+28 / -29
19
Gemma 2 27B
951
+26 / -27
20
Llama 3.2 90B Vision Instruct
944
+24 / -27
21
Claude 3 Opus
919
+22 / -22
22
Llama 3.1 405B Instruct
915
+23 / -27
23
Llama 3.3 70B Instruct
910
+30 / -32
24
Llama 3 70B Instruct
882
+31 / -30
25
Gemini 1.0 Pro
846
+31 / -29
26
Mistral Large
846
+28 / -29
27
Claude 3 Sonnet
845
+30 / -30
28
Aya 23 35B*
794
+29 / -30