DeepSeek V3.2 vs GLM-4.7
tree_0012 · epguides.com * Main Menu Page
Timeline
Arrow keys or j/k move between rounds.
Round Context
epguides.com * Main Menu Page
TVmaze.com
Identify the long-running television database website established in 1995 that uses the tagline 'Cataloging the opiate of the masses on the small screen'. According to the text on its main menu page, what specific file format is available for downloading a list of 'all shows', and which two third-party websites are explicitly named as the platforms where users should submit corrections and updates to individual episodes?
Answer length: 150-250 words.
Show hidden checklists
- Target Entity: epguides.com
- Logic Proof: The site was founded in 1995 and prominently features the slogan 'Cataloging the opiate of the masses on the small screen' in its header.
- Identifies the file format for 'all shows' as .csv (Comma Separated Values)
- Identifies 'TVmaze' as a site for sending corrections
- Identifies 'TV.com' as a site for sending corrections
The question requires Deep Reasoning to identify the specific website 'epguides.com' using its unique slogan and founding year, without naming it. It then requires Wide Aggregation to scan the page content for two distinct types of information: a technical detail (file format) and procedural details (external partners for corrections), which are located in different sections of the text.
Judgment
Both agents correctly identified the core entity (epguides.com). However, both struggled with the specific data retrieval requested in the prompt. The ground truth checklist specifies the file format as '.csv' and the correction sites as 'TVmaze' and 'TV.com'. Agent A failed the file format (claiming .gz) but correctly identified 'TV.com' as one of the correction sites (though it missed TVmaze and incorrectly listed IMDb). Agent B failed the file format (claiming .zip) and missed both correction sites entirely, hallucinating 'The TVDB' and 'IMDb'. Agent A wins for having partial accuracy on the details and better formatting (bolding key terms), whereas Agent B failed every specific detail check.
DeepSeek V3.2
DeepSeek
GLM-4.7
Zhipu AI