Claude 4 (Opus + Sonnet): AI coding hits junior-dev level
In one sentence Anthropic launches Claude Opus 4 and Sonnet 4. Opus 4 reaches 72.5% on SWE-bench Verified (vs 49% for Sonnet 3.7), can work autonomously on coding tasks for hours. 'Extended thinking' built in.
Anthropic ships Claude 4 in two variants: Opus 4 (flagship, smart) and Sonnet 4 (default, balanced). Both can "think long" before answering (extended thinking), like OpenAI's o1 but with the reasoning trace visible.
The headline: Opus 4 reaches 72.5% on the SWE-bench Verified benchmark, which measures how well a model can solve real bugs from GitHub issues. For reference: a junior human developer scores around 50-60%.
In Claude Code, Opus 4 can work autonomously on complex tasks for 5-7 hours: reads, edits, tests, commits on large codebases. The "AI as a colleague" pattern becomes concrete.
Companies
Anthropic
Tools
Claude Opus 4, Claude Sonnet 4
Tags
Sources