Every contender, one click away.
Browse the full Deep Research Arena field by rank, provider, answer-failure profile, and recent form. Each card opens a model deep-dive built from the same final Elo, head-to-head, and judged answer data used on the leaderboard.
Claude Opus 4.6
Anthropic
88W / 52L / 4T across 144 matches
Gemini 3.1 Pro
75W / 38L / 0T across 113 matches
GPT 5.4
OpenAI
68W / 46L / 2T across 116 matches
o3
OpenAI
101W / 76L / 1T across 178 matches
GPT-5.1
OpenAI
105W / 99L / 1T across 205 matches
Gemini 2.5 Pro
86W / 90L / 2T across 178 matches
Grok 4
xAI
94W / 86L / 0T across 180 matches
Claude Opus 4.1
Anthropic
125W / 112L / 4T across 241 matches
Kimi K2
Moonshot AI
133W / 128L / 6T across 267 matches
Sonar Pro
Perplexity
105W / 132L / 2T across 239 matches
DeepSeek V3.2
DeepSeek
95W / 72L / 13T across 180 matches
GLM-4.7
Zhipu AI
58W / 53L / 10T across 121 matches
Qwen3-235B
Alibaba
67W / 107L / 5T across 179 matches
Seed 1.6
ByteDance
42W / 101L / 6T across 149 matches
Sonar Reasoning Pro
Perplexity
31W / 81L / 4T across 116 matches