SWE-bench

📣 New benchmark: CodeClash (website, github) evaluates SWE agents on goals, not tasks
📣 New: Meet mini, the 100 line AI agent that still gets 65% on SWE-bench verified!

Software engineering agents, benchmarks, and models.

Built and maintained by researchers from Stanford University and Princeton University.

This organization contains the source code for several projects in the SWE-* open source ecosystem, including:

SWE-bench, a benchmark for evaluating AI systems on real world GitHub issues.
SWE-agent, a system that automatically solves GitHub issues using an LM agent.
SWE-smith, a toolkit for generating SWE training data at scale.
mini, an AI agent written in just 100 lines of code that scores >70% on SWE-bench verified

Also check out the supporting infrastructure for working with SWE-* projects

SWE-ReX, infrastructure supporting sandboxed code execution for AI agents
sb-cli, a command line interface for running evaluations on the cloud.
Mirror clones for the SWE-bench and SWE-smith repositories are available here and here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SWE-bench

Pinned Loading

Repositories

People

Top languages

Most used topics