Skip to content
AImpact
IT EN
Inference Intermediate Also known as: Software Engineering Bench

SWE-bench

/swee-bench/

A benchmark of over 2,000 real GitHub issues from Python repositories: the model must produce a patch that makes the project's tests pass.

ShareLinkedInX

In practice

It measures real software-engineering ability (reading a codebase, debugging, cross-file edits), not isolated coding. It has become the reference for agents like Devin, Claude Code, and OpenAI Codex.

Related terms

← All terms