Skip to content
AImpact
IT EN
Landmark AI Coding · 1 min read

Devin: 13.86% on SWE-bench, the first autonomous AI software engineer

In one sentence Cognition publishes Devin, the first AI agent to autonomously resolve 13.86% of real bugs on SWE-bench full, ten times above GPT-4 without external scaffolding.

Verified Official source
ShareLinkedInX
Reading level

Devin is the first AI presented as a true "autonomous software engineer". It does not just suggest code: it plans the work, explores the codebase, writes and tests solutions, then opens a pull request with the complete fix. On the SWE-bench benchmark, which uses real GitHub issues, Devin resolves nearly 14% of problems on its own. Previous tools like GPT-4 without additional support stopped at 1.7%. It is not yet perfect, but this is the moment AI stops being an assistant and starts becoming a collaborator.

Companies

Cognition AI

Tools

Devin

Tags

Autonomous AgentSWE-benchCode AgentBenchmark

Sources