Coding
Deprecated (as of March 2025)
Performance Comparison
1237.00±31.00
o3-mini
1137.00±34.00
GPT-4o (November 2024)
1132.00±31.00
1123.00±35.00
Gemini 2.0 Flash Experimental (December 2024)
1111.00±26.00
Gemini 2.0 Pro (December 2024)
1109.00±33.00
Gemini 2.0 Flash Thinking (January 2025)
1108.00±37.00
DeepSeek R1
1100.00±31.00
o1 (December 2024)
1083.00±30.00
1079.00±22.00
1045.00±25.00
GPT-4o (May 2024)
1036.00±24.00
GPT-4 Turbo Preview
1034.00±22.00
Claude 3 Opus
959.00±26.00
Mistral Large 2
1029.00±23.00
Llama 3.1 405B Instruct
1022.00±24.00
1007.00±26.00
Gemini 1.5 Pro (May 2024)
994.00±25.00
GPT-4 (November 2024)
992.00±28.00
Deepseek V3
985.00±25.00
Llama 3.2 90B Vision Instruct
984.00±30.00
Gemini 1.5 Flash
943.00±26.00
Gemini 1.5 Pro (April 2024)
891.00±32.00
Claude 3 Sonnet
879.00±31.00
Llama 3 70B Instruct
871.00±26.00
Mistral Large
811.00±25.00
Gemini 1.0 Pro
685.00±34.00
CodeLlama 34B Instruct
598.00±38.00