Scale Logo
SEAL Logo

Korean

Deprecated (as of March 2025)

Last updated: March 27, 2025

Performance Comparison

1

o1-preview

66.43±5.47

2

Claude 3.7 Sonnet (February 2025)

64.93±5.51

3

GPT-4o (May 2024)

64.58±5.52

4

GPT-4.5 Preview (February 2025)

63.76±5.56

5

GPT-4 Turbo Preview

60.76±5.64

6

Gemini 1.5 Pro (August 27, 2024)

60.28±5.66

7

GPT-4o (August 2024)

59.93±5.67

8

Claude 3.5 Sonnet (June 2024)

59.38±5.67

9

Gemini 2.0 Flash

55.60±5.73

10

Claude 3 Sonnet

54.17±5.78

11

Claude 3 Opus

52.78±5.77

12

GPT-4o mini

51.74±5.77

13

GPT-4

51.39±5.77

14

Mistral Large 2

50.35±5.78

15

Llama 3.1 405B Instruct

50.35±5.78

16

Gemini 1.5 Pro (May 2024)

40.42±5.68

17

Llama 3.1 70B Instruct

37.23±5.60

18

Command R+

30.21±5.30

19

Llama 3.1 8B Instruct

17.42±4.39