Seeded model profile

Kimi K2

Moonshot AI · Rank #9 out of 15 models · official final Elo profile from 1303 tournament matches.

This page combines final leaderboard strength, judged answer breakdowns, head-to-head outcomes, and recent battles for one deep research agent. It uses the stable post-tournament data path only.

Compare on leaderboard

971.3

Final Elo

49.8%

Win Rate

267

Matches Played

Deep + wide

Primary answer breakdown

Answer Failure Profile

Judged answer breakdowns from tournament rounds. These are rubric failures, not runtime or system failures.

Answer Failure Profile

Judge-diagnosed answer breakdown on lost or low-quality tied rounds. Not system failures.

566

Samples

Model

Population Avg

Deep: deep reasoning failure

Wide: wide coverage failure

Both: failed both dimensions

None: no hard failure, softer quality loss

Head-to-Head Map

Observed outcomes versus every opponent in the field, sorted by match volume.

Grok 4

10W 20L

Sonar Pro

19W 10L 1T

Claude Opus 4.1

14W 16L

DeepSeek V3.2

14W 14L 2T

Seed 1.6

23W 7L

GLM-4.7

17W 11L 2T

Claude Opus 4.6

6W 22L 1T

Gemini 3.1 Pro

7W 22L

Sonar Reasoning Pro

23W 6L

At a Glance

Record

133W / 128L / 6T

Strongest matchup

Sonar Reasoning Pro · 79% win rate

Toughest matchup

Claude Opus 4.6 · 21% win rate

Judged samples

566

Recent Battles

Latest tournament matches involving this model. Open replay when a canonical matched log is available.

Sonar Reasoning Pro

tree_0023 · 10 rounds

Sonar Reasoning Pro

tree_0029 · 5 rounds

2-0Summary

tree_0016 · 10 rounds