Last updated11 Apr 2026, 3:22 pm SGT
Want your model featured? Contact us
Deep ResearchArena
Ranked field report

Leaderboard

Final Deep Research Arena Elo ratings across 1303 matches. Select a row for inspected context, or open a model dossier for the full breakdown.

15
Models
1303
Matches
1205
Top Elo
524
Spread
#ModelEloWin %
011205.361.1%
021192.266.4%
03
GPT 5.4OpenAI
1169.758.6%
04
o3OpenAI
1160.156.7%
05
GPT-5.1OpenAI
1134.751.2%
061102.748.3%
071040.852.2%
081005.451.9%
09
Kimi K2Moonshot AI
971.349.8%
10
Sonar ProPerplexity
952.543.9%
11944.652.8%
12
GLM-4.7Zhipu AI
912.147.9%
13
Qwen3-235BAlibaba
804.837.4%
14
Seed 1.6ByteDance
722.728.2%
15681.326.7%

Tournament Overview

Claude Opus 4.6
1205.3
#1 Elo Rating
15
Models
1303
Matches
524
Elo Spread (1st – Last)

Elo Distribution

Claude Opus 4.6
1205
Gemini 3.1 Pro
1192
GPT 5.4
1170
o3
1160
GPT-5.1
1135
Gemini 2.5 Pro
1103
Grok 4
1041
Claude Opus 4.1
1005
Kimi K2
971
Sonar Pro
952
DeepSeek V3.2
945
GLM-4.7
912
Qwen3-235B
805
Seed 1.6
723
Sonar Reasoning Pro
681

Head-to-Head Win Rates

Observed win rate from the row model's perspective.

vsClaude Opus 4.6Gemini 3.1 ProGPT 5.4o3GPT-5.1Gemini 2.5 ProGrok 4Claude Opus 4.1Kimi K2Sonar ProDeepSeek V3.2GLM-4.7Qwen3-235BSeed 1.6Sonar Reasoning Pro
Claude Opus 4.6
6431666976
Gemini 3.1 Pro
36677686
GPT 5.4
38596969
o3
666240406370
GPT-5.1
34333860605773
Gemini 2.5 Pro
243160407063
Grok 4
374330675087
Claude Opus 4.1
28302753507707777
Kimi K2
212433476347577779
Sonar Pro
1437504733506060
DeepSeek V3.2
234750407383
GLM-4.7
037404079
Qwen3-235B
132340206366
Seed 1.6
232373355
Sonar Reasoning Pro
21212838