GLM-4.7 vs Sonar Reasoning Pro
tree_0012 · epguides.com * Main Menu Page
Timeline
Arrow keys or j/k move between rounds.
Round Context
epguides.com * Main Menu Page
TVmaze.com
A long-running online television episode guide, active since the mid-1990s, provides only episode titles and airdates on its own pages but links each episode to external databases for expanded information. Identify the two third-party television databases it directs users to for detailed episode data. For each database, describe the types of additional information users can typically find there (such as content beyond titles and airdates), and explain how the site instructs users to submit corrections or updates to individual episodes.
Answer length: 200-300 words.
Show hidden checklists
- TVmaze + Correctly identified as an external episode database linked for expanded episode information (guest stars, plot summaries)
- TV.com + Correctly identified as an external site where users can send corrections and updates to episode information
- TVmaze identified as one of the linked databases
- Description that TVmaze provides additional details such as guest stars and plot summaries
- TV.com identified as the other database referenced for corrections/updates
- Explanation that corrections and updates to individual episodes should be sent to the editors at TVmaze or TV.com via those sites
The question uses deep reasoning by describing the functionality and history of the episode guide site without naming it, requiring identification through its distinctive features (titles/airdates only, mid-1990s origin, external linking model). It then requires wide aggregation by asking for details about both third-party databases—what information they provide and how corrections are handled—ensuring the respondent must gather and synthesize information about multiple entities rather than relying on a single fact.
Judgment
First, Deep Logic: Agent A identifies epguides.com correctly but names IMDb instead of TVmaze, which fails the core entity requirement. It also incorrectly states that corrections should be emailed to the webmaster rather than sent via TVmaze or TV.com, failing the checklist. Agent B does not attempt to identify any entities and provides no substantive answer, thus also failing Deep Logic. Width/Completeness: Agent A provides detailed descriptions of IMDb and TV.com, but since IMDb is incorrect and the correction workflow is wrong, it fails key checklist items. Agent B provides no relevant details at all. User Experience: Agent A is structured, formatted, and informative, but substantively incorrect. Agent B is a refusal with no useful content. Since both fail the core entity requirement (TVmaze + TV.com with correct correction process), this constitutes a Low Quality Tie under the rubric.
GLM-4.7
Zhipu AI
Sonar Reasoning Pro
Perplexity