Scale AI logo
SEAL Logo
SEAL Showdown Logo
Real people. Real conversations. Real rankings.Showdown ranks AI models based on how they perform in real-world use— not synthetic tests or lab settings. Votes are blind, optional, and organic, so rankings reflect authentic preferences.Methodology & Technical Report
0 promptsReal conversation prompts compared across models through pairwise votes.
0 usersFrom 80+ countries and 70+ languages, spanning all backgrounds and professions.
RANK
MODEL ↑↓
VOTES ↑↓
SCORE ↑↓
1
gpt-5-chat
3725
1111.13
-9.24 +7.23
1
claude-opus-4-1-20250805
5018
1096.83
-7 +5.82
3
claude-sonnet-4-20250514
7057
1076.90
-5.78 +5.65
3
claude-opus-4-20250514
6169
1068.58
-6.1 +6.03
3
gpt-4.1-2025-04-14
7595
1067.91
-5.14 +4.38
3
claude-opus-4-1-20250805 (Thinking)
4456
1067.49
-8.32 +6.67
4
gemini-2.5-pro-preview-06-05
5741
1058.21
-5.85 +5.42
8
claude-opus-4-20250514 (Thinking)
5962
1040.89
-6.95 +5.49
8
claude-sonnet-4-20250514 (Thinking)
7067
1033.72
-5.19 +5.12
9
gemini-2.5-flash-preview-05-20
8330
1023.57
-5.09 +6.77
10
o3-2025-04-16-medium*
9067
1021.78
-6.38 +5.11
12
llama4-maverick-instruct-basic
8788
1000.00
-5.16 +6.53
12
o4-mini-2025-04-16-medium*
8735
990.85
-4.45 +5.77
* This model’s API does not consistently return Markdown-formatted responses. Since raw outputs are used in head-to-head comparisons, this may affect its ranking.

Win Rate vs. Each Model

Battle Count vs. Each Model

Confidence

Average Win Rate

Prompt Distribution