Real people. Real conversations. Real rankings.

Showdown ranks AI models based on how they perform in real-world use— not synthetic tests or lab settings. Votes are blind, optional, and organic, so rankings reflect authentic preferences.Methodology & Technical Report→

0 promptsReal conversation prompts compared across models through pairwise votes.

0 usersFrom 80+ countries and 70+ languages, spanning all backgrounds and professions.

Style Control

RANK ↑

MODEL ↑↓

VOTES ↑↓

SCORE ↑↓

gpt-5.2-chat-latest

8608

1145.33

-4.53 +5.40

gemini-3-flash

8579

1138.66

-4.20 +4.85

claude-opus-4-5-20251101 (Thinking)

7837

1128.64

-5.91 +4.88

gemini-2.5-pro

14311

1128.46

-4.36 +3.86

claude-opus-4-5-20251101

9710

1126.76

-4.36 +3.73

gemini-3-pro-preview

10799

1124.43

-4.22 +3.84

claude-sonnet-4-5-20250929

14643

1118.01

-3.39 +3.61

claude-sonnet-4-5-20250929 (Thinking)

14688

1109.30

-3.37 +4.00

gpt-5-chat

11529

1107.36

-4.73 +3.98

qwen3-235b-a22b-2507-v1

12602

1106.00

-3.91 +4.71

gpt-5.1-2025-11-13-medium

10059

1098.36

-4.17 +4.85

gpt-5.2-2025-12-11-medium

8526

1091.58

-3.93 +5.04

kimi-k2-thinking

11716

1088.30

-4.38 +3.02

claude-opus-4-1-20250805

15656

1085.82

-3.73 +3.96

deepseek-v3p2

9961

1081.07

-5.35 +4.08

claude-opus-4-1-20250805 (Thinking)

14001

1078.14

-3.82 +4.78

claude-haiku-4-5-20251001

7871

1071.29

-5.33 +3.96

gemini-2.5-flash

13692

1070.72

-3.58 +2.88

claude-sonnet-4-20250514

23664

1069.94

-2.48 +2.43

claude-haiku-4-5-20251001 (Thinking)

7704

1064.40

-4.31 +4.74

claude-sonnet-4-20250514 (Thinking)

13836

1062.49

-3.61 +3.75

deepseek-r1-0528

11430

1048.33

-3.27 +4.83

o3-2025-04-16-medium*

23181

1045.01

-2.64 +3.16

gpt-5-2025-08-07-medium*

16606

1012.72

-2.88 +4.15

llama4-maverick-instruct-basic

12587

1000.00

-3.20 +4.05

o4-mini-2025-04-16-medium*

23980

999.57

-3.60 +3.06

* This model’s API does not consistently return Markdown-formatted responses. Since raw outputs are used in head-to-head comparisons, this may affect its ranking.

Win Rate vs. Each Model

Battle Count vs. Each Model

Confidence Intervals

Average Win Rate

Prompt Distribution

Win Rate vs. Each Model

Battle Count vs. Each Model

Confidence

Average Win Rate

Prompt Distribution

Style Control

RANK ↑

MODEL ↑↓

VOTES ↑↓

SCORE ↑↓

gpt-5.2-chat-latest

8608

1145.33

-4.53 +5.40

gemini-3-flash

8579

1138.66

-4.20 +4.85

claude-opus-4-5-20251101 (Thinking)

7837

1128.64

-5.91 +4.88

gemini-2.5-pro

14311

1128.46

-4.36 +3.86

claude-opus-4-5-20251101

9710

1126.76

-4.36 +3.73

gemini-3-pro-preview

10799

1124.43

-4.22 +3.84

claude-sonnet-4-5-20250929

14643

1118.01

-3.39 +3.61

claude-sonnet-4-5-20250929 (Thinking)

14688

1109.30

-3.37 +4.00

gpt-5-chat

11529

1107.36

-4.73 +3.98

qwen3-235b-a22b-2507-v1

12602

1106.00

-3.91 +4.71

gpt-5.1-2025-11-13-medium

10059

1098.36

-4.17 +4.85

gpt-5.2-2025-12-11-medium

8526

1091.58

-3.93 +5.04

kimi-k2-thinking

11716

1088.30

-4.38 +3.02

claude-opus-4-1-20250805

15656

1085.82

-3.73 +3.96

deepseek-v3p2

9961

1081.07

-5.35 +4.08

claude-opus-4-1-20250805 (Thinking)

14001

1078.14

-3.82 +4.78

claude-haiku-4-5-20251001

7871

1071.29

-5.33 +3.96

gemini-2.5-flash

13692

1070.72

-3.58 +2.88

claude-sonnet-4-20250514

23664

1069.94

-2.48 +2.43

claude-haiku-4-5-20251001 (Thinking)

7704

1064.40

-4.31 +4.74

claude-sonnet-4-20250514 (Thinking)

13836

1062.49

-3.61 +3.75

deepseek-r1-0528

11430

1048.33

-3.27 +4.83

o3-2025-04-16-medium*

23181

1045.01

-2.64 +3.16

gpt-5-2025-08-07-medium*

16606

1012.72

-2.88 +4.15

llama4-maverick-instruct-basic

12587

1000.00

-3.20 +4.05

o4-mini-2025-04-16-medium*

23980

999.57

-3.60 +3.06

* This model’s API does not consistently return Markdown-formatted responses. Since raw outputs are used in head-to-head comparisons, this may affect its ranking.

Showdown Leaderboard - LLMs

Real people. Real conversations. Real rankings.

SEAL Leaderboard - LLMs

gpt-5.2-chat-latest

gemini-3-flash

claude-opus-4-5-20251101 (Thinking)

gemini-2.5-pro

claude-opus-4-5-20251101

gemini-3-pro-preview

claude-sonnet-4-5-20250929

claude-sonnet-4-5-20250929 (Thinking)

gpt-5-chat

qwen3-235b-a22b-2507-v1

gpt-5.1-2025-11-13-medium

gpt-5.2-2025-12-11-medium

kimi-k2-thinking

claude-opus-4-1-20250805

deepseek-v3p2

claude-opus-4-1-20250805 (Thinking)

claude-haiku-4-5-20251001

gemini-2.5-flash

claude-sonnet-4-20250514

claude-haiku-4-5-20251001 (Thinking)

claude-sonnet-4-20250514 (Thinking)

deepseek-r1-0528

o3-2025-04-16-medium

gpt-5-2025-08-07-medium

llama4-maverick-instruct-basic

o4-mini-2025-04-16-medium

Performance Comparison Across Language Models

Win Rate vs. Each Model

Win Rate vs Each Model

Battle Count vs. Each Model

Battle Count vs. Each Model

Confidence

Confidence Intervals

Average Win Rate

Average Win Rate

Prompt Distribution

Prompt Distribution

SEAL Leaderboard - LLMs

gpt-5.2-chat-latest

gemini-3-flash

claude-opus-4-5-20251101 (Thinking)

gemini-2.5-pro

claude-opus-4-5-20251101

gemini-3-pro-preview

claude-sonnet-4-5-20250929

claude-sonnet-4-5-20250929 (Thinking)

gpt-5-chat

qwen3-235b-a22b-2507-v1

gpt-5.1-2025-11-13-medium

gpt-5.2-2025-12-11-medium

kimi-k2-thinking

claude-opus-4-1-20250805

deepseek-v3p2

claude-opus-4-1-20250805 (Thinking)

claude-haiku-4-5-20251001

gemini-2.5-flash

claude-sonnet-4-20250514

claude-haiku-4-5-20251001 (Thinking)

claude-sonnet-4-20250514 (Thinking)

deepseek-r1-0528

o3-2025-04-16-medium

gpt-5-2025-08-07-medium

llama4-maverick-instruct-basic

o4-mini-2025-04-16-medium

Performance Comparison Across Language Models

Win Rate vs. Each Model

Win Rate vs Each Model

Battle Count vs. Each Model

Battle Count vs. Each Model

Confidence

Confidence Intervals

Average Win Rate

Average Win Rate

Prompt Distribution

Prompt Distribution