Gemini 3.1 Pro vs Claude Opus 4.6
tree_0012 · epguides.com * Main Menu Page
Timeline
Arrow keys or j/k move between rounds.
Round Context
epguides.com * Main Menu Page
TVmaze.com
Identify the long-running online television episode catalog that has operated since the mid-1990s and indexes over 12,000 TV series by listing episode titles and original airdates, typically linking each episode to an external database for expanded details. Based on publicly available information about this platform, name the third-party episode database it primarily links to for guest stars and plot summaries, and indicate where users are directed to submit episode corrections. Additionally, enumerate all the countries for which it provides dedicated schedule grids, describe the special "printable" feature available for individual series, and list any non-U.S. or non-standard television-related data offerings (such as city-specific exports or radio listings) that the platform makes accessible.
Answer length: 200-300 words.
Show hidden checklists
- Correctly identifies epguides.com + proof: founded in 1995 and catalogs over 12,400 TV shows with episode titles and airdates
- Explains that the site links episode titles to TVmaze for additional details, matching the described functionality
- Identifies TVmaze as the primary external database for expanded episode details
- Mentions that corrections can be sent to editors at TVmaze or TV.com
- Lists all schedule grid countries: United States, United Kingdom, Canada, Australia, New Zealand, Japan, and South Korea
- Describes the 'Printable Guide' feature that compiles summaries for all episodes of a show
- Notes availability of streaming video source links on individual show pages
- Mentions Chicago TV listings available as a .csv export
- Mentions UK Radio Shows listings
The question uses identifying traits (mid-1990s origin, 12,000+ indexed series, title/airdate listings, external episode database links) to indirectly point to the correct platform without naming it (Deep logic). It then requires aggregating multiple categories of information—external databases, correction destinations, international schedule grids, special features, and unique data exports—ensuring the respondent must synthesize details from across the platform’s publicly available materials (Wide scope).
Judgment
Both agents correctly identify epguides.com as the entity (passes DEEP logic). However, Agent B incorrectly states that IMDb/TV.com are the primary external databases for expanded details, contradicting the ground truth that TVmaze is the primary link target. That is a major checklist miss. Both agents fail WIDTH completeness: neither lists all required schedule grid countries (missing New Zealand, Japan, South Korea), neither mentions streaming video source links, Chicago .csv exports, or specifically UK Radio Shows listings. Agent A correctly identifies TVmaze as the primary database and explains the correction workflow more accurately, though still incomplete. In presentation, both are well-structured, but Agent A is slightly clearer and more precise about the core linkage behavior. Because Agent A has fewer factual errors while Agent B contains a key incorrect claim, Agent A is BETTER (not MUCH_BETTER, since A still misses several required checklist items).
Gemini 3.1 Pro
Claude Opus 4.6
Anthropic