Sonar Pro vs Qwen3-235B
tree_0016 · Software Developers, Quality Assurance Analysts, and Testers : Occupational Outlook Handbook: : U.S. Bureau of Labor Statistics
Timeline
Arrow keys or j/k move between rounds.
Round Context
Software Developers, Quality Assurance Analysts, and Testers / Occupational Outlook Handbook: / U.S. Bureau of Labor Statistics
Computer and Information Systems Managers / Occupational Outlook Handbook: / U.S. Bureau of Labor Statistics
According to the U.S. Bureau of Labor Statistics Occupational Outlook Handbook (OOH) projections for the 2024–2034 period, identify the specific computer-related occupation (Occupation A) that reports a 2024 median pay of exactly $131,450 and requires no prior work experience in a related occupation. Then, identify the management-level occupation (Occupation B) within the same sector that shares the same 15% job outlook growth rate but explicitly requires '5 years or more' of related work experience. For both identified occupations, provide the full official title, the 2024 median pay, and the projected numeric employment change for the 2024–2034 decade.
Answer length: 200-300 words.
Show hidden checklists
- Identified Occupation A: Software Developers, Quality Assurance Analysts, and Testers (or Software Developers)
- Identified Occupation B: Computer and Information Systems Managers
- Logic Validation: Confirmed Occupation B is the management role with 15% growth and 5+ years experience requirement vs Occupation A's 0 years.
- Occupation A 2024 Median Pay: $131,450
- Occupation A Projected Employment Change: 287,900
- Occupation B 2024 Median Pay: $171,200
- Occupation B Projected Employment Change: 101,600
The question uses a specific data point ($131,450 median pay) to anchor the first entity without naming it (Deep Logic). It then requires the agent to deduce the second entity based on a comparison of attributes (Management role, same growth rate, different experience requirement) found in a different section of the handbook (Deep Logic). Finally, it asks for a specific statistic (Employment Change) for both entities that is not mentioned in the query, forcing the agent to aggregate information from both target profiles (Wide Aggregation).
Judgment
Agent A correctly identified the core entity (Occupation A) as Software Developers, matching the specific pay constraint in the Ground Truth. Agent B failed this Deep Logic check, incorrectly identifying the role as Research Scientists and hallucinating the pay to match the prompt. Agent A is capped at 'Better' rather than 'Much Better' because it failed to retrieve the specific numeric data points (Projected Change for A, Pay/Change for B), resorting to 'not specified' or citing 'openings' instead. However, Agent A's grounded accuracy is superior to Agent B's hallucinations.
Sonar Pro
Perplexity
Qwen3-235B
Alibaba