Model
Score
95% CI
1
Gemini 1.5 Pro (August 27, 2024)
1147
+33 / -32
2
Gemini 2.0 Pro (December 2024)
1138
+29 / -26
3
Gemini 2.0 Flash Thinking (January 2025)
1120
+29 / -28
4
o1 (December 2024)
1120
+27 / -26
5
Gemini 1.5 Pro (November 2024)
1116
+28 / -27
6
o3-mini
1093
+30 / -29
7
Gemini Pro Flash 2
1090
+28 / -26
8
o1-preview
1087
+36 / -33
9
GPT-4o (August 2024)
1066
+47 / -43
11
GPT-4 (November 2024)
1011
+26 / -25
12
Claude 3.5 Sonnet (June 2024)
995
+44 / -46
13
Mistral Large 2
970
+54 / -47
14
Gemini 1.5 Flash
967
+39 / -44
15
Aya 23 35B*
932
+25 / -24
16
Aya Expanse 32B
1025
+24 / -26
16
Llama 3.1 405B Instruct
875
+55 / -51
17
Llama 3.3 70B Instruct
808
+36 / -37
18
Jais Adapted 70B
787
+25 / -27
19
Gemma 2 27B
661
+29 / -34