Coding
Deprecated (as of March 2025)
Performance Comparison
1237.00 ±31.00
o3-mini
1137.00 ±34.00
GPT-4o (November 2024)
1132.00 ±31.00
1123.00 ±35.00
Gemini 2.0 Flash Experimental (December 2024)
1111.00 ±26.00
Gemini 2.0 Pro (December 2024)
1109.00 ±33.00
Gemini 2.0 Flash Thinking (January 2025)
1108.00 ±37.00
DeepSeek R1
1100.00 ±31.00
o1 (December 2024)
1083.00 ±30.00
1079.00 ±22.00
1045.00 ±25.00
GPT-4o (May 2024)
1036.00 ±24.00
GPT-4 Turbo Preview
1034.00 ±22.00
Claude 3 Opus
959.00 ±26.00
Mistral Large 2
1029.00 ±23.00
Llama 3.1 405B Instruct
1022.00 ±24.00
1007.00 ±26.00
Gemini 1.5 Pro (May 2024)
994.00 ±25.00
GPT-4 (November 2024)
992.00 ±28.00
Deepseek V3
985.00 ±25.00
Llama 3.2 90B Vision Instruct
984.00 ±30.00
Gemini 1.5 Flash
943.00 ±26.00
Gemini 1.5 Pro (April 2024)
891.00 ±32.00
Claude 3 Sonnet
879.00 ±31.00
Llama 3 70B Instruct
871.00 ±26.00
Mistral Large
811.00 ±25.00
Gemini 1.0 Pro
685.00 ±34.00
CodeLlama 34B Instruct
598.00 ±38.00