Gemini 3.1 Pro vs GPT-5.1
tree_0012 · epguides.com * Main Menu Page
Timeline
Arrow keys or j/k move between rounds.
Round Context
epguides.com * Main Menu Page
TVmaze.com
A long-running online television episode guide directs users to two separate third-party platforms for expanded episode information such as guest stars and plot summaries, and it advises users to send episode corrections to the editors of those same platforms. Identify these two external entertainment databases and, for each one, provide its founding year, headquarters location, and a brief description of its primary purpose in the television industry.
Answer length: 200-300 words.
Show hidden checklists
- TVmaze – Correctly identified as one of the external platforms linked for episode details and corrections
- TV.com – Correctly identified as the other external platform linked for episode details and corrections
- TVmaze – Founding year
- TVmaze – Headquarters location
- TVmaze – Primary purpose as a TV database/platform
- TV.com – Founding year
- TV.com – Headquarters location
- TV.com – Primary purpose as an entertainment/TV database platform
The question uses deep reasoning by describing the functional role (external platforms for episode details and corrections) rather than naming the entities, requiring the agent to infer the two specific databases. It then applies wide aggregation by requiring multiple factual attributes (founding year, headquarters, and purpose) for both entities, ensuring the answer cannot be derived from a single brief reference and must synthesize information from multiple reliable sources.
Judgment
First, Deep Logic: The correct two platforms are TVmaze and TV.com. Agent A identified IMDb and TVmaze (missing TV.com). Agent B identified IMDb and TV.com (missing TVmaze). Therefore, BOTH agents failed the core entity requirement. Second, Width/Completeness: Agent A provided correct details for TVmaze but irrelevant details for IMDb. Agent B provided correct details for TV.com but irrelevant details for IMDb. Each answered only half of the required checklist correctly. Since both failed to identify the correct pair of external databases (core logical requirement), this results in a LOW QUALITY TIE. Neither response satisfies the foundational constraint despite otherwise solid formatting and detail quality.
Gemini 3.1 Pro
GPT-5.1
OpenAI