GPT-5.1 vs Grok 4
tree_0016 · Software Developers, Quality Assurance Analysts, and Testers : Occupational Outlook Handbook: : U.S. Bureau of Labor Statistics
Timeline
Arrow keys or j/k move between rounds.
Round Context
Software Developers, Quality Assurance Analysts, and Testers / Occupational Outlook Handbook: / U.S. Bureau of Labor Statistics
Computer and Information Systems Managers / Occupational Outlook Handbook: / U.S. Bureau of Labor Statistics
According to the U.S. Bureau of Labor Statistics Occupational Outlook Handbook (2024–2034 projections), identify the two specific computer-related occupations that are projected to grow by exactly 15%. For each of these two identified occupations, report the 2024 median annual pay, the projected numerical change in employment over the decade, and the specific requirement regarding work experience in a related occupation.
Answer length: 200-300 words.
Show hidden checklists
- Identifies Entity 1: Software Developers, Quality Assurance Analysts, and Testers (based on 15% growth match)
- Identifies Entity 2: Computer and Information Systems Managers (based on 15% growth match)
- Software Developers/QA Analysts: 2024 Median Pay of $131,450
- Software Developers/QA Analysts: Projected Employment Change of 287,900
- Software Developers/QA Analysts: Work Experience requirement is 'None'
- Computer and Information Systems Managers: 2024 Median Pay of $171,200
- Computer and Information Systems Managers: Projected Employment Change of 101,600
- Computer and Information Systems Managers: Work Experience requirement is '5 years or more'
The question masks the entities by filtering for a shared statistical attribute (15% growth rate) found in two separate source documents. This requires 'Deep' reasoning to identify the correct roles among many computer occupations. The 'Wide' component is enforced by requesting specific data points (pay, employment change, experience) that differ between the two entities and are located in their respective detailed profiles, preventing the use of a single summary table.
Judgment
Agent A is the winner because it correctly identified one of the two target entities ('Computer and Information Systems Managers') specified in the Ground Truth, whereas Agent B failed to identify either of the correct occupations. Agent A also provided superior formatting with clear bullet points, making the information much easier to scan. However, Agent A cannot receive a 'Much Better' score because it relied on outdated data (likely the 2023-2033 projections) rather than the requested 2024-2034 data. This resulted in factual errors regarding the median pay and projected numerical change (e.g., reporting $169,510 instead of the correct $171,200). Despite these data errors, Agent A is more useful than Agent B, which hallucinated the entities entirely.
GPT-5.1
OpenAI
Grok 4
xAI