Model
Score
95% CI
1
o1 (December 2024)
1124
+32 / -31
2
Gemini 2.0 Pro (December 2024)
1105
+34 / -30
3
GPT-4o (August 2024)
1098
+27 / -24
4
Gemini Pro Flash 2
1098
+29 / -30
5
o1-preview
1098
+25 / -24
6
Claude 3.5 Sonnet (June 2024)
1077
+25 / -26
7
Gemini 1.5 Pro (November 2024)
1056
+29 / -27
8
Gemini 1.5 Pro (August 27, 2024)
1054
+25 / -24
9
Gemini 2.0 Flash Thinking (January 2025)
1050
+31 / -33
10
o3-mini
1047
+31 / -31
11
Gemini 1.5 Flash
1045
+28 / -27
12
Mistral Large 2
999
+26 / -27
13
GPT-4 (November 2024)
982
+28 / -26
14
Aya Expanse 32B
965
+28 / -30
15
Llama 3.1 405B Instruct
888
+30 / -31
16
Aya 23 35B*
776
+34 / -37
17
Gemma 2 27B
775
+29 / -30
18
Llama 3.3 70B Instruct
764
+37 / -37