Ranked field report

Leaderboard

Final Deep Research Arena Elo ratings across 1303 matches. Select a row for inspected context, or open a model dossier for the full breakdown.

Models

1303

Matches

1205

Top Elo

524

Spread

# ↑	Model	Elo	Matches	Wins	L / T	Win %
01	Claude Opus 4.6Anthropic	1205.3	144	88	52 / 4	61.1%
02	Gemini 3.1 ProGoogle	1192.2	113	75	38 / 0	66.4%
03	GPT 5.4OpenAI	1169.7	116	68	46 / 2	58.6%
04	o3OpenAI	1160.1	178	101	76 / 1	56.7%
05	GPT-5.1OpenAI	1134.7	205	105	99 / 1	51.2%
06	Gemini 2.5 ProGoogle	1102.7	178	86	90 / 2	48.3%
07	Grok 4xAI	1040.8	180	94	86 / 0	52.2%
08	Claude Opus 4.1Anthropic	1005.4	241	125	112 / 4	51.9%
09	Kimi K2Moonshot AI	971.3	267	133	128 / 6	49.8%
10	Sonar ProPerplexity	952.5	239	105	132 / 2	43.9%
11	DeepSeek V3.2DeepSeek	944.6	180	95	72 / 13	52.8%
12	GLM-4.7Zhipu AI	912.1	121	58	53 / 10	47.9%
13	Qwen3-235BAlibaba	804.8	179	67	107 / 5	37.4%
14	Seed 1.6ByteDance	722.7	149	42	101 / 6	28.2%
15	Sonar Reasoning ProPerplexity	681.3	116	31	81 / 4	26.7%

Tournament Overview

Claude Opus 4.6

1205.3

#1 Elo Rating

Models

1303

Matches

524

Elo Spread (1st – Last)

Elo Distribution

Claude Opus 4.6

1205

Gemini 3.1 Pro

1192

GPT 5.4

1170

1160

GPT-5.1

1135

Gemini 2.5 Pro

1103

Grok 4

1041

Claude Opus 4.1

1005

Kimi K2

971

Sonar Pro

952

DeepSeek V3.2

945

GLM-4.7

912

Qwen3-235B

805

Seed 1.6

723

Sonar Reasoning Pro

681

Tournament Overview

Claude Opus 4.6

1205.3

#1 Elo Rating

Models

1303

Matches

524

Elo Spread (1st – Last)

Elo Distribution

Claude Opus 4.6

1205

Gemini 3.1 Pro

1192

GPT 5.4

1170

1160

GPT-5.1

1135

Gemini 2.5 Pro

1103

Grok 4

1041

Claude Opus 4.1

1005

Kimi K2

971

Sonar Pro

952

DeepSeek V3.2

945

GLM-4.7

912

Qwen3-235B

805

Seed 1.6

723

Sonar Reasoning Pro

681

Head-to-Head Win Rates

Observed win rate from the row model's perspective.

vs	Claude Opus 4.6	Gemini 3.1 Pro	GPT 5.4	o3	GPT-5.1	Gemini 2.5 Pro	Grok 4	Claude Opus 4.1	Kimi K2	Sonar Pro	DeepSeek V3.2	GLM-4.7	Qwen3-235B	Seed 1.6	Sonar Reasoning Pro
Claude Opus 4.6	—	64	—	31	66	69	—	—	76	—	—	—	—	—	—
Gemini 3.1 Pro	36	—	—	—	67	—	—	—	76	86	—	—	—	—	—
GPT 5.4	—	—	—	38	59	69	—	69	—	—	—	—	—	—	—
o3	66	—	62	—	40	40	63	70	—	—	—	—	—	—	—
GPT-5.1	34	33	38	60	—	60	57	73	—	—	—	—	—	—	—
Gemini 2.5 Pro	24	—	31	60	40	—	70	—	—	63	—	—	—	—	—
Grok 4	—	—	—	37	43	30	—	—	67	50	—	—	87	—	—
Claude Opus 4.1	—	—	28	30	27	—	—	—	53	50	77	0	77	77	—
Kimi K2	21	24	—	—	—	—	33	47	—	63	47	57	—	77	79
Sonar Pro	—	14	—	—	—	37	50	47	33	—	50	60	60	—	—
DeepSeek V3.2	—	—	—	—	—	—	—	23	47	50	—	40	73	83	—
GLM-4.7	—	—	—	—	—	—	—	0	37	40	40	—	—	—	79
Qwen3-235B	—	—	—	—	—	—	13	23	—	40	20	—	—	63	66
Seed 1.6	—	—	—	—	—	—	—	23	23	—	7	—	33	—	55
Sonar Reasoning Pro	—	—	—	—	—	—	—	—	21	—	—	21	28	38	—