Model
Score
95% CI
1
o1-preview
1118
+43 / -47
2
Gemini 2.0 Pro (December 2024)
1117
+36 / -33
3
o3-mini
1104
+31 / -32
4
Gemini 1.5 Pro (August 27, 2024)
1093
+44 / -51
5
o1 (December 2024)
1086
+30 / -32
6
Gemini 1.5 Pro (November 2024)
1084
+30 / -29
7
GPT-4o (August 2024)
1077
+42 / -40
8
Claude 3.5 Sonnet (June 2024)
1069
+36 / -39
9
Gemini Pro Flash 2
1064
+33 / -30
10
GPT-4 (November 2024)
978
+27 / -29
11
Mistral Large 2
967
+41 / -42
12
Gemini 1.5 Flash
966
+56 / -52
13
Aya Expanse 32B
960
+32 / -31
14
Llama 3.1 405B Instruct
895
+83 / -88
15
Gemma 2 27B
892
+42 / -43
16
Llama 3.3 70B Instruct
778
+35 / -36
17
Aya 23 35B*
760
+48 / -46