Skip to content
@SWE-bench

SWE-bench

Organization for maintaining SWE-bench and related projects

📣 New benchmark: CodeClash (website, github) evaluates SWE agents on goals, not tasks
📣 New: Meet mini, the 100 line AI agent that still gets 65% on SWE-bench verified!

SWE-bench   SWE-agent   codeclash logo   SWE-smith   mini-SWE-agent   SWE-ReX   sb-cli

Software engineering agents, benchmarks, and models.
Built and maintained by researchers from Stanford University and Princeton University.

HuggingFace Slack YouTube


This organization contains the source code for several projects in the SWE-* open source ecosystem, including:

  • SWE-bench, a benchmark for evaluating AI systems on real world GitHub issues.
  • SWE-agent, a system that automatically solves GitHub issues using an LM agent.
  • SWE-smith, a toolkit for generating SWE training data at scale.
  • mini, an AI agent written in just 100 lines of code that scores >70% on SWE-bench verified

Also check out the supporting infrastructure for working with SWE-* projects

  • SWE-ReX, infrastructure supporting sandboxed code execution for AI agents
  • sb-cli, a command line interface for running evaluations on the cloud.
  • Mirror clones for the SWE-bench and SWE-smith repositories are available here and here.

Pinned Loading

  1. SWE-bench SWE-bench Public

    SWE-bench: Can Language Models Resolve Real-world Github Issues?

    Python 4.3k 761

  2. SWE-smith SWE-smith Public

    [NeurIPS 2025 D&B Spotlight] Scaling Data for SWE-agents

    Python 573 107

  3. experiments experiments Public

    Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.

    Shell 247 293

  4. sb-cli sb-cli Public

    Run SWE-bench evaluations remotely

    Python 58 7

Repositories

Showing 9 of 9 repositories
  • swe-bench.github.io Public

    Landing page + leaderboard for SWE-Bench benchmark

    SWE-bench/swe-bench.github.io’s past year of commit activity
    JavaScript 11 15 5 0 Updated Feb 20, 2026
  • experiments Public

    Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.

    SWE-bench/experiments’s past year of commit activity
    Shell 247 293 11 26 Updated Feb 20, 2026
  • SWE-bench Public

    SWE-bench: Can Language Models Resolve Real-world Github Issues?

    SWE-bench/SWE-bench’s past year of commit activity
    Python 4,332 MIT 761 58 27 Updated Feb 19, 2026
  • SWE-smith Public

    [NeurIPS 2025 D&B Spotlight] Scaling Data for SWE-agents

    SWE-bench/SWE-smith’s past year of commit activity
    Python 573 MIT 107 12 (1 issue needs help) 4 Updated Feb 17, 2026
  • SWE-smith-envs Public

    Artifacts for building environments (Docker images) for repositories represented in SWE-smith

    SWE-bench/SWE-smith-envs’s past year of commit activity
    Dockerfile 5 2 0 0 Updated Feb 14, 2026
  • reading-list Public

    Academic papers and works related to SWE-bench and SWE-agents

    SWE-bench/reading-list’s past year of commit activity
    9 4 0 0 Updated Dec 8, 2025
  • .github Public
    SWE-bench/.github’s past year of commit activity
    0 MIT 0 0 0 Updated Nov 14, 2025
  • sb-cli Public

    Run SWE-bench evaluations remotely

    SWE-bench/sb-cli’s past year of commit activity
    Python 58 MIT 7 10 0 Updated Aug 14, 2025
  • humanevalfix-results Public archive

    Evaluation data + results for SWE-agent inference on HumanEvalFix task

    SWE-bench/humanevalfix-results’s past year of commit activity
    Jupyter Notebook 1 0 0 0 Updated Jul 11, 2024