Grok 4 vs GPT-5.1
tree_0003 · The 17 best photography websites
Timeline
Arrow keys or j/k move between rounds.
Round Context
The 17 best photography websites
The best camera phones – tried and tested by a photographer
Identify the digital design and art publication that released a feature article titled 'The 17 best photography websites'. After identifying the publication, search its archives to find a specific buyer's guide dedicated to mobile photography hardware. Provide the full title of this mobile guide, specifically including the subtitle that asserts the testing was conducted by a professional.
Answer length: 50-100 words.
Show hidden checklists
- Publication: Creative Bloq (Identified via 'The 17 best photography websites')
- Article Title: The best camera phones
- Subtitle/Qualifier: tried and tested by a photographer
The question uses Deep Logic by obscuring the publication name 'Creative Bloq,' requiring the agent to identify it via a specific article title ('The 17 best photography websites'). It then applies Wide Logic by requiring the retrieval of a specific, separate article title and its unique qualifier ('tried and tested by a photographer') found within that identified domain.
Judgment
Both agents correctly identified the publication as Creative Bloq. Agent A correctly identified the specific buyer's guide ('Best camera phone') that aligns with the provided Ground Truth. Although Agent A paraphrased the subtitle ('tested and ranked by experts') rather than quoting the exact Ground Truth phrasing ('tried and tested by a photographer'), it pointed to the correct article. Agent B failed the Wide check by selecting a guide for 'accessories' rather than the primary mobile hardware (phones) guide, and likely hallucinated the specific title to fit the constraints.
Grok 4
xAI
GPT-5.1
OpenAI