Scale AI logo
SEAL Logo

Showdown Leaderboard - LLMs

SEAL Showdown Logo

Real people. Real conversations. Real rankings.

Showdown ranks AI models based on how they perform in real-world use— not synthetic tests or lab settings. Votes are blind, optional, and organic, so rankings reflect authentic preferences.Methodology & Technical Report
0 promptsReal conversation prompts compared across models through pairwise votes.
0 usersFrom 80+ countries and 70+ languages, spanning all backgrounds and professions.

SEAL Leaderboard - LLMs

RANK
MODEL ↑↓
VOTES ↑↓
SCORE ↑↓
1

gpt-5-chat

gpt-5-chat
7844
1105.33
-4.76 +4.59
1

claude-opus-4-1-20250805

claude-opus-4-1-20250805
10250
1104.54
-4.66 +5.11
3

claude-sonnet-4-20250514

claude-sonnet-4-20250514
11960
1083.00
-5.11 +4.13
3

claude-opus-4-20250514

claude-opus-4-20250514
10559
1078.38
-4.69 +5.09
5

claude-opus-4-1-20250805 (Thinking)

claude-opus-4-1-20250805 (Thinking)
9041
1068.10
-7.23 +4.88
5

gpt-4.1-2025-04-14

gpt-4.1-2025-04-14
12260
1065.02
-3.73 +3.37
7

gemini-2.5-pro-preview-06-05

gemini-2.5-pro-preview-06-05
10512
1047.16
-4.31 +6.15
7

claude-opus-4-20250514 (Thinking)

claude-opus-4-20250514 (Thinking)
10355
1045.95
-4.59 +5.68
7

claude-sonnet-4-20250514 (Thinking)

claude-sonnet-4-20250514 (Thinking)
12043
1043.05
-4.26 +3.25
10

o3-2025-04-16-medium

o3-2025-04-16-medium*
14104
1020.46
-3.6 +4.22
10

gemini-2.5-flash-preview-05-20

gemini-2.5-flash-preview-05-20
12734
1018.08
-3.44 +5.75
12

llama4-maverick-instruct-basic

llama4-maverick-instruct-basic
12970
1000.00
-5.18 +4.31
13

o4-mini-2025-04-16-medium

o4-mini-2025-04-16-medium*
13288
989.11
-4.79 +3.89
* This model’s API does not consistently return Markdown-formatted responses. Since raw outputs are used in head-to-head comparisons, this may affect its ranking.

Performance Comparison Across Language Models

Win Rate vs. Each Model

Win Rate vs Each Model

Battle Count vs. Each Model

Battle Count vs. Each Model

Confidence

Confidence Intervals

Average Win Rate

Average Win Rate

Prompt Distribution

Prompt Distribution