o3 vs GPT 5.4
tree_0016 · Software Developers, Quality Assurance Analysts, and Testers : Occupational Outlook Handbook: : U.S. Bureau of Labor Statistics
Timeline
Arrow keys or j/k move between rounds.
Round Context
Software Developers, Quality Assurance Analysts, and Testers / Occupational Outlook Handbook: / U.S. Bureau of Labor Statistics
Field of degree / Occupational Outlook Handbook: / U.S. Bureau of Labor Statistics
An occupation is described as designing computer applications or programs, typically requiring a bachelor’s degree, with employment projected to grow much faster than average from 2024 to 2034 and a 2024 median annual pay of $131,450. Using these characteristics to identify the occupation, determine: (1) the broader occupational group in which it is categorized and provide that group’s projected average annual openings, overall median annual wage (May 2024), and general growth outlook for 2024–2034; and (2) the related academic major category that aligns with this occupation on the federal career outlook site, including a brief description of what those major-category pages provide and the page’s last modified date. Answer length: 200-300 words.
Answer length: 200-300 words.
Show hidden checklists
- Broader occupational group correctly identified as Computer and Information Technology Occupations, logically derived from the described software development occupation
- Related academic major category correctly identified as Computer and Information Technology under Field of Degree, logically connected to the bachelor’s-level preparation for the occupation
- Broader occupational group projected average annual openings: about 317,700 per year
- Broader occupational group median annual wage (May 2024): $105,990
- Broader occupational group growth outlook: much faster than average (2024–2034)
- Related academic major category description: pages highlight data from BLS and Census, show occupations, outlook, and worker information for that major
- Last modified date of the major-category page: August 28, 2025
The question uses salary, duties, education, and growth data to implicitly identify the specific software development occupation (Deep logic) without naming it. The respondent must then determine both its broader occupational group and its aligned academic major category. To answer fully, they must aggregate multiple data points—employment projections, wages, outlook, descriptive page purpose, and modification date—from two distinct but related sections (Wide scope).
Judgment
First, Deep Logic: Both agents correctly identified the occupation as Software Developers and placed it in the broader group Computer and Information Technology Occupations. However, Agent A failed the second DEEP requirement by misidentifying the related academic major category as “Computer Science” instead of the broader “Computer and Information Technology” field-of-degree category. Agent B correctly identified the appropriate major category. Second, Width/Completeness: Both agents provided incorrect projected average annual openings for the broader occupational group and both gave incorrect last modified dates for the major-category page. However, Agent A also gave an incorrect median annual wage for the broader group and an incorrect major category, resulting in more checklist failures. Agent B correctly reported the group’s median wage ($105,990) and growth outlook. Finally, User Experience & Presentation: Both responses are clearly structured, but Agent B offers slightly clearer contextual framing and stronger integration of how the occupation connects to the broader group and major category. Although both contain factual errors, Agent B is more accurate overall and better aligned with the required entities. Therefore, Agent B wins on relative accuracy and completeness, though not at the “MUCH_BETTER” level due to its remaining factual errors.
o3
OpenAI
GPT 5.4
OpenAI