Korean

Deprecated (as of March 2025)

Last updated: March 27, 2025

Performance Comparison

1

o1-preview

66.43 ±5.47

2

Claude 3.7 Sonnet (February 2025)

64.93 ±5.51

3

GPT-4o (May 2024)

64.58 ±5.52

4

GPT-4.5 Preview (February 2025)

63.76 ±5.56

5

GPT-4 Turbo Preview

60.76 ±5.64

6

Gemini 1.5 Pro (August 27, 2024)

60.28 ±5.66

7

GPT-4o (August 2024)

59.93 ±5.67

8

Claude 3.5 Sonnet (June 2024)

59.38 ±5.67

9

Gemini 2.0 Flash

55.6 ±5.73

10

Claude 3 Sonnet

54.17 ±5.78

11

Claude 3 Opus

52.78 ±5.77

12

GPT-4o mini

51.74 ±5.77

13

GPT-4

51.39 ±5.77

14

Mistral Large 2

50.35 ±5.78

15

Llama 3.1 405B Instruct

50.35 ±5.78

16

Gemini 1.5 Pro (May 2024)

40.42 ±5.68

17

Llama 3.1 70B Instruct

37.23 ±5.60

18

Command R+

30.21 ±5.30

19

Llama 3.1 8B Instruct

17.42 ±4.39