Landmark AI Coding · 1 min read
Devin: 13.86% on SWE-bench, the first autonomous AI software engineer
In one sentence Cognition publishes Devin, the first AI agent to autonomously resolve 13.86% of real bugs on SWE-bench full, ten times above GPT-4 without external scaffolding.
Reading level
Devin is the first AI presented as a true "autonomous software engineer". It does not just suggest code: it plans the work, explores the codebase, writes and tests solutions, then opens a pull request with the complete fix. On the SWE-bench benchmark, which uses real GitHub issues, Devin resolves nearly 14% of problems on its own. Previous tools like GPT-4 without additional support stopped at 1.7%. It is not yet perfect, but this is the moment AI stops being an assistant and starts becoming a collaborator.
Companies
Cognition AI
Tools
Devin
Tags
Autonomous AgentSWE-benchCode AgentBenchmark
Sources