Grok 4 vs GPT-5.1
tree_0025 · Cosmetology
Timeline
Arrow keys or j/k move between rounds.
Round Context
Cosmetology
Home
Identify the academic library Subject Guide for 'Careers' that is navigationally situated between Computer Science and Criminal Justice, and specifically lists the print resources *Guide to Good Hairdressing* by Julie Handley and *Successful Salon Management* by Edward J. Tezak in its Cosmetology section. Who is the librarian listed as the contact for this guide, and what full list of other academic subjects is she responsible for according to her profile?
Answer length: 200-300 words.
Show hidden checklists
- Librarian Name: Marianne Le
- Institution: Everett Community College (EvCC) (implied by the specific collection and address found in the target source)
- Business
- Education
- Human Development
- Nursing & Health Sciences
- Nutrition
- Philosophy
- Political Science
- Psychology
- Religion
- Sociology
The question uses 'Deep' logic by fingerprinting a specific library guide using a unique combination of book titles and navigational context (Previous/Next categories) found in the source text. It requires 'Wide' aggregation by asking for the full list of other subjects managed by the identified librarian, which are listed in the sidebar/contact section of the target page.
Judgment
Both agents failed the Deep Logic check established by the Ground Truth. The prompt specifies a guide containing specific print resources (*Guide to Good Hairdressing*, *Successful Salon Management*) and a specific navigational position. The Ground Truth identifies the librarian as **Marianne Le** (associated with Everett Community College). Both Agent A and Agent B identified incorrect institutions (St. Johns River State College and Daytona State College) and incorrect librarians (Genelle Austin and Ann Geier). Furthermore, both agents hallucinated the navigational logic: they claimed the "Careers" guide is situated alphabetically between "Computer Science" and "Criminal Justice." Alphabetically, "Careers" (Ca) precedes "Computer Science" (Co), making this physically impossible in a standard A-Z list. Both agents likely hallucinated the presence of the specific books to fit the user's constraints. While Agent B had better formatting (bullet points), the complete failure of accuracy by both models results in a Low Quality Tie.
Grok 4
xAI
GPT-5.1
OpenAI