GPT-5.1 vs Claude Opus 4.6
tree_0012 · epguides.com * Main Menu Page
Timeline
Arrow keys or j/k move between rounds.
Round Context
epguides.com * Main Menu Page
TVmaze.com
A long-running online television episode guide, active since the mid-1990s, provides title-and-airdate listings for thousands of shows and links each episode title to external databases for expanded details. It also features a "New Episodes Today" section with separate country-specific listings. Identify the two external television databases it links to for expanded episode information and corrections, describing what kind of additional details users can find there and where corrections should be directed. Then, list all the country-specific versions of the "New Episodes Today" feature, expanding each abbreviation to its full country name and explaining the context in which these country listings are presented.
Answer length: 200-300 words.
Show hidden checklists
- TVmaze + identified as the linked episode-detail site providing guest stars and plot summaries for episodes on a long-running TV episode guide active since 1995
- TV.com + identified as the linked site where users can submit corrections and updates to episode information
- US, UK, CA, AU, NZ, JP, KR + correctly identified as the full set of country-specific 'New Episodes Today' listings associated with the same television episode guide platform
- TVmaze identified as an external database with additional details such as guest stars and plot summaries
- TV.com identified as an external database where corrections and updates can be sent
- Explanation that corrections to individual episodes can be sent to editors at TVmaze or TV.com
- United States (US) listed as one of the New Episodes Today country listings
- United Kingdom (UK) listed as one of the New Episodes Today country listings
- Canada (CA) listed as one of the New Episodes Today country listings
- Australia (AU) listed as one of the New Episodes Today country listings
- New Zealand (NZ) listed as one of the New Episodes Today country listings
- Japan (JP) listed as one of the New Episodes Today country listings
- South Korea (KR) listed as one of the New Episodes Today country listings
- Explanation that these countries appear as separate links under a 'New Episodes Today' schedule feature
The question uses deep logic by referring to a long-running TV episode guide from the mid-1990s that links episode titles to external databases, masking the specific site name. It then requires wide aggregation by asking for both external databases and all country-specific schedule listings, including expanded country names and contextual explanation. This ensures the responder must identify the correct platform and comprehensively gather multiple scattered details.
Judgment
First, Deep Logic: Agent A incorrectly identifies the external databases as IMDb and TheTVDB, failing the core entity requirement (should be TVmaze and TV.com). Agent B correctly identifies TVmaze but incorrectly substitutes TheTVDB for TV.com, so B partially fails Deep Logic but is closer to the ground truth than A. Next, Width/Completeness: Both agents fail to list the full set of required country-specific listings (US, UK, CA, AU, NZ, JP, KR). Both only list four countries (US, UK, CA, AU). Agent A further adds incorrect European countries (DE, FR, IT, ES), introducing additional hallucinations. Neither mentions TV.com as the corrections site (B misidentifies it as TheTVDB), and neither fully explains corrections directed specifically to TVmaze or TV.com as required. Finally, User Experience & Presentation: Both are clearly structured and readable. However, Agent B is cleaner, more focused, and avoids introducing extra incorrect country listings. While still flawed, B is factually closer to the ground truth and more aligned with the expected entity (TVmaze). Therefore, Agent B is better, but not by a wide margin due to shared factual gaps.
GPT-5.1
OpenAI
Claude Opus 4.6
Anthropic