Autonomous Engineering Team Runtime Built on AgentField
Pronounced: "swee-AF" (one word)
One API call → full engineering team → shipped code.
Quick Start • Why SWE-AF • In Action • Factory Control • Benchmark • Modes • API • Architecture
One API call spins up a full autonomous engineering team — product managers, architects, coders, reviewers, testers — that scopes, builds, adapts, and ships complex software end to end. SWE-AF is a first step toward autonomous software engineering factories, scaling from simple goals to hard multi-issue programs with hundreds to thousands of agent invocations.
curl -X POST http://localhost:8080/api/v1/execute/async/swe-planner.build \
-H "Content-Type: application/json" \
-d @- <<'JSON'
{
"input": {
"goal": "Refactor and harden auth + billing flows",
"repo_url": "https://github.com/user/my-project",
"config": {
"runtime": "claude_code",
"models": {
"default": "sonnet",
"coder": "opus",
"qa": "opus"
},
"enable_learning": true
}
}
}
JSONSwap models.default and any role key (coder, qa, architect, etc.) to any model your runtime supports.
SWE-AF works in two modes: point it at a single repository, or orchestrate coordinated changes across multiple repos in one build.
The default. Pass repo_url (remote) or repo_path (local) and SWE-AF handles everything:
curl -X POST http://localhost:8080/api/v1/execute/async/swe-planner.build \
-H "Content-Type: application/json" \
-d '{
"input": {
"goal": "Add JWT auth",
"repo_url": "https://github.com/user/my-project"
}
}'When your work spans multiple codebases — a primary app plus shared libraries, monorepo sub-projects, or dependent microservices — pass config.repos as an array with roles:
curl -X POST http://localhost:8080/api/v1/execute/async/swe-planner.build \
-H "Content-Type: application/json" \
-d '{
"input": {
"goal": "Add JWT auth across API and shared-lib",
"config": {
"repos": [
{
"repo_url": "https://github.com/org/main-app",
"role": "primary"
},
{
"repo_url": "https://github.com/org/shared-lib",
"role": "dependency"
}
],
"runtime": "claude_code",
"models": {
"default": "sonnet"
}
}
}
}'Roles:
primary— The main application. Changes here drive the build; failures block progress.dependency— Libraries or services modified to support the primary repo. Failures are captured but don't block.
Use cases:
- Primary app + shared SDK or utilities library
- Monorepo sub-projects that live in separate repos
- Feature spanning multiple microservices (e.g., API + worker queue)
Rust-based Python compiler benchmark (built autonomously):
| Metric | CPython (subprocess) | RustPython (SWE-AF) | Improvement |
|---|---|---|---|
| Steady-state execution | Baseline (~19ms) | Optimized in-process runtime | 88.3x-602.3x faster |
| Geometric mean | 1.0x baseline | 253.8x | 253.8x |
| Peak throughput | ~52 ops/s | 31,807 ops/s | ~612x |
Measurement methodology
Throughput comparison measures different execution models: CPython subprocess spawn (~19ms per call → ~52 ops/s) vs RustPython pre-warmed interpreter pool (in-process). This is the real-world tradeoff the system was built to optimize — replacing repeated subprocess invocations with a persistent pool for short-snippet execution.
Artifact trail includes 175 tracked autonomous agents across planning, coding, review, merge, and verification.
Details: examples/llm-rust-python-compiler-sonnet/README.md
Most agent frameworks wrap a single coder loop. SWE-AF is a coordinated engineering factory — planning, execution, and governance agents run as a control stack that adapts in real time.
- Hardness-aware execution — easy issues pass through quickly, while hard issues trigger deeper adaptation and DAG-level replanning instead of blind retries.
- Factory architecture — not a single-agent wrapper. Planning, execution, and governance agents run as a coordinated control stack — the architecture encodes the engineering strategy, not the prompts (see The Atomic Unit of Intelligence).
- Multi-model, multi-provider — assign different models per role (
coder: opus,qa: haiku). Works with Claude, OpenRouter, OpenAI, and Google. - Continual learning — with
enable_learning=true, conventions and failure patterns discovered early are injected into downstream issues. - Agent-scale parallelism — dependency-level scheduling + isolated git worktrees allow large fan-out without branch collisions.
- Fleet-scale orchestration — many SWE-AF nodes can run continuously in parallel via AgentField, driving thousands of agent invocations across concurrent builds.
- Explicit compromise tracking — when scope is relaxed, debt is typed, severity-rated, and propagated.
- Long-run reliability — checkpointed execution supports
resume_buildafter crashes or interruptions.
PR #179: Go SDK DID/VC Registration — built entirely by SWE-AF (Claude runtime with haiku-class models). One API call, zero human code.
| Metric | Value |
|---|---|
| Issues completed | 10/10 |
| Tests passing | 217 |
| Acceptance criteria | 34/34 |
| Agent invocations | 79 |
| Model | claude-haiku-4-5 |
| Total cost | $19.23 |
Cost breakdown by agent role
| Role | Cost | % |
|---|---|---|
| Coder | $5.88 | 30.6% |
| Code Reviewer | $3.48 | 18.1% |
| QA | $1.78 | 9.2% |
| GitHub PR | $1.66 | 8.6% |
| Integration Tester | $1.59 | 8.3% |
| Merger | $1.22 | 6.3% |
| Workspace Ops | $1.77 | 9.2% |
| Planning (PM + Arch + TL + Sprint) | $0.79 | 4.1% |
| Verifier + Finalize | $0.34 | 1.8% |
| Synthesizer | $0.05 | 0.2% |
79 invocations, 2,070 conversation turns. Planning agents scope and decompose; coders work in parallel isolated worktrees; reviewers and QA validate each issue; merger integrates branches; verifier checks acceptance criteria against the PRD.
Claude, open-source, and Codex models supported: Run builds with any runtime and tune models per role in one flat config map.
runtime: "claude_code"maps to Claude backend.runtime: "open_code"maps to OpenCode backend (OpenRouter/OpenAI/Google/Anthropic model IDs).runtime: "codex"maps to the OpenAI Codex CLI backend.
SWE-AF uses three nested control loops to adapt to task difficulty in real time:
| Loop | Scope | Trigger | Action |
|---|---|---|---|
| Inner loop | Single issue | QA/review fails | Coder retries with feedback |
| Middle loop | Single issue | Inner loop exhausted | run_issue_advisor retries with a new approach, splits work, or accepts with debt |
| Outer loop | Remaining DAG | Escalated failures | run_replanner restructures remaining issues and dependencies |
This is the core factory-control behavior: control agents supervise worker agents and continuously reshape the plan as reality changes.
One click deploys SWE-AF + AgentField control plane + PostgreSQL. Set two environment variables in Railway:
CLAUDE_CODE_OAUTH_TOKEN— runclaude setup-tokenin Claude Code CLI (uses Pro/Max subscription credits)GH_TOKEN— GitHub personal access token withreposcope for PR creation
Once deployed, trigger a build:
curl -X POST https://<control-plane>.up.railway.app/api/v1/execute/async/swe-planner.build \
-H "Content-Type: application/json" \
-H "X-API-Key: this-is-a-secret" \
-d '{"input": {"goal": "Add JWT auth", "repo_url": "https://github.com/user/my-repo"}}'- Python 3.12+
- AgentField control plane (
af) - AI provider API key (Anthropic, OpenRouter, OpenAI, or Google)
python3.12 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install -e ".[dev]"af # starts AgentField control plane on :8080
python -m swe_af # registers node id "swe-planner"# Default (uses Claude)
curl -X POST http://localhost:8080/api/v1/execute/async/swe-planner.build \
-H "Content-Type: application/json" \
-d @- <<'JSON'
{
"input": {
"goal": "Add JWT auth to all API endpoints",
"repo_url": "https://github.com/user/my-project"
}
}
JSON
# With open-source runtime + flat role map
curl -X POST http://localhost:8080/api/v1/execute/async/swe-planner.build \
-H "Content-Type: application/json" \
-d @- <<'JSON'
{
"input": {
"goal": "Add JWT auth",
"repo_url": "https://github.com/user/my-project",
"config": {
"runtime": "open_code",
"models": {
"default": "openrouter/minimax/minimax-m2.5"
}
}
}
}
JSON
# With Codex CLI runtime
curl -X POST http://localhost:8080/api/v1/execute/async/swe-planner.build \
-H "Content-Type: application/json" \
-d @- <<'JSON'
{
"input": {
"goal": "Add JWT auth",
"repo_url": "https://github.com/user/my-project",
"config": {
"runtime": "codex",
"models": {
"default": "gpt-5.3-codex"
}
}
}
}
JSON
# Fast mode with Codex CLI runtime
curl -X POST http://localhost:8080/api/v1/execute/async/swe-fast.build \
-H "Content-Type: application/json" \
-d @- <<'JSON'
{
"input": {
"goal": "Add a focused bug fix",
"repo_url": "https://github.com/user/my-project",
"config": {
"runtime": "codex",
"models": {
"default": "gpt-5.3-codex"
}
}
}
}
JSON
# Local workspace mode (repo_path) + targeted role override
curl -X POST http://localhost:8080/api/v1/execute/async/swe-planner.build \
-H "Content-Type: application/json" \
-d @- <<'JSON'
{
"input": {
"goal": "Refactor and harden auth + billing flows",
"repo_path": "/path/to/repo",
"config": {
"runtime": "claude_code",
"models": {
"default": "sonnet",
"coder": "opus",
"qa": "opus"
},
"enable_learning": true
}
}
}
JSONFor OpenRouter with open_code, use model IDs in openrouter/<provider>/<model> format (for example openrouter/minimax/minimax-m2.5).
For Codex with ChatGPT subscription auth, install the Codex CLI on the host, run codex login, leave OPENAI_API_KEY unset for this process, and set SWE_CODEX_AUTH_MODE=chatgpt or auto. For OpenAI API-platform billing, set SWE_CODEX_AUTH_MODE=api_key and OPENAI_API_KEY.
Codex deployments using the Docker image must set
SWE_DEFAULT_MODEL=gpt-5.3-codexon the environment (or passmodels: {"default": "gpt-5.3-codex"}in every build'sconfig). The image bakesHARNESS_MODEL=openrouter/moonshotai/kimi-k2.6as an OpenCode fallback, and SWE-AF's model-resolution env cascade readsHARNESS_MODEL— so withoutSWE_DEFAULT_MODELset, the Codex CLI receives an OpenRouter model id it can't handle and the Product Manager reasoner fails in ~13s. SettingSWE_DEFAULT_MODELmakes the cascade pin every role to the Codex model.
Codex CLI's
workspace-writesandbox uses bubblewrap (bwrap) and needs Linux user namespaces enabled on the host. Most production Linux hosts and managed container runtimes (Railway, etc.) allow this by default, but local Docker on WSL2 or hardened environments may refuse withbwrap: No permissions to create a new namespace. If the verifier reports that error, the coder ran but couldn't write files — enable user namespaces on the host before relying on the codex runtime there.
Coding and review agents can look up external documentation, library APIs, error messages, and version/deprecation status during a build. This is opt-in via two env vars on the deployment:
OPENCODE_ENABLE_EXA=1
EXA_API_KEY=...
When set, opencode's built-in websearch and webfetch tools become available to every reasoner running through the open runtime — the model decides when to use them based on the task. Get a key at exa.ai.
The coder reasoner additionally gets a brief restraint guideline appended to its system prompt, so a long coding loop doesn't rabbit-hole on searches it could answer by reading the codebase. No setup required beyond the env vars; the wiring inherits parent env naturally through agentfield's CLI harness.
This works on the open runtime (opencode). The Claude runtime uses Anthropic's first-party WebSearch/WebFetch and is currently not wired here — file an issue if you want it.
- Architecture is generated and reviewed before coding starts
- Issues are dependency-sorted and run in parallel across isolated worktrees
- Each issue gets dedicated coder, tester, and reviewer passes
- Failed issues trigger advisor-driven adaptation (split, re-scope, or escalate)
- Escalations trigger replanning of the remaining DAG
- End result is merged, integration-tested, and verified against acceptance criteria
Typical runs spin up 400-500+ agent instances across planning, execution, QA, and verification. For larger DAGs and repeated adaptation/replanning cycles, SWE-AF can scale into the high hundreds to thousands of agent invocations in a single build.
95/100 with haiku and MiniMax: SWE-AF scored 95/100 with both Claude haiku-class routing ($20) and MiniMax M2.5 via open runtime ($6), outperforming Claude Code sonnet (73), Codex o3 (62), and Claude Code haiku (59) on the same prompt.
| Dimension | SWE-AF (haiku) | SWE-AF (MiniMax) | CC Sonnet | Codex (o3) | CC Haiku |
|---|---|---|---|---|---|
| Functional (30) | 30 | 30 | 30 | 30 | 30 |
| Structure (20) | 20 | 20 | 10 | 10 | 10 |
| Hygiene (20) | 20 | 20 | 16 | 10 | 7 |
| Git (15) | 15 | 15 | 2 | 2 | 2 |
| Quality (15) | 10 | 10 | 15 | 10 | 10 |
| Total | 95 | 95 | 73 | 62 | 59 |
| Cost | ~$20 | ~$6 | ? | ? | ? |
| Time | ~30-40 min | 43 min | ? | ? | ? |
Full benchmark details and reproduction
Same prompt tested across multiple agents. SWE-AF with Claude runtime (haiku-class model mapping) used 400+ agent instances; SWE-AF with MiniMax M2.5 via open runtime achieved identical quality at 70% cost savings.
Prompt used for all agents:
Build a Node.js CLI todo app with add, list, complete, and delete commands. Data should persist to a JSON file. Initialize git, write tests, and commit your work.
| Dimension | Points | What it measures |
|---|---|---|
| Functional | 30 | CLI behavior and passing tests |
| Structure | 20 | Modular source layout and test organization |
| Hygiene | 20 | .gitignore, clean status, no junk artifacts |
| Git | 15 | Commit discipline and message quality |
| Quality | 15 | Error handling, package metadata, README quality |
# SWE-AF (Claude runtime, haiku-class mapping) - $20, 30-40 min
curl -X POST http://localhost:8080/api/v1/execute/async/swe-planner.build \
-H "Content-Type: application/json" \
-d @- <<'JSON'
{
"input": {
"goal": "Build a Node.js CLI todo app with add, list, complete, and delete commands. Data should persist to a JSON file. Initialize git, write tests, and commit your work.",
"repo_path": "/tmp/swe-af-output",
"config": {
"runtime": "claude_code",
"models": {
"default": "haiku"
}
}
}
}
JSON
# SWE-AF (MiniMax M2.5 via OpenRouter runtime) - $6, 43 min
curl -X POST http://localhost:8080/api/v1/execute/async/swe-planner.build \
-H "Content-Type: application/json" \
-d @- <<'JSON'
{
"input": {
"goal": "Build a Node.js CLI todo app with add, list, complete, and delete commands. Data should persist to a JSON file. Initialize git, write tests, and commit your work.",
"repo_path": "/workspaces/todo-app-benchmark",
"config": {
"runtime": "open_code",
"models": {
"default": "openrouter/minimax/minimax-m2.5"
}
}
}
}
JSON
# Claude Code (haiku)
claude -p "Build a Node.js CLI todo app with add, list, complete, and delete commands. Data should persist to a JSON file. Initialize git, write tests, and commit your work." --model haiku --dangerously-skip-permissions
# Claude Code (sonnet)
claude -p "Build a Node.js CLI todo app with add, list, complete, and delete commands. Data should persist to a JSON file. Initialize git, write tests, and commit your work." --model sonnet --dangerously-skip-permissions
# Codex (gpt-5.3-codex)
codex exec "Build a Node.js CLI todo app with add, list, complete, and delete commands. Data should persist to a JSON file. Initialize git, write tests, and commit your work." --full-autoMiniMax M2.5 Measured Metrics (Feb 2026):
- 99.22% code coverage (only agent with measured coverage)
- 4 custom error types (TodoError, ValidationError, NotFoundError, StorageError)
- 999 LOC, 4 modules, 74 tests, 9 commits
Production Quality Analysis: Objective comparison of measurable metrics across all agents.
Benchmark assets, logs, evaluator, and generated projects live in examples/agent-comparison/.
Ship code, then audit it: SEC-AF runs the same multi-agent architecture against your codebase — 250 agents, 94% noise reduction, every finding proven.
cp .env.example .env
# Add your API key: ANTHROPIC_API_KEY, OPENROUTER_API_KEY, OPENAI_API_KEY, or GOOGLE_API_KEY
# Optionally add GH_TOKEN for PR workflow
docker compose up -dSubmit a build:
# Default (Claude)
curl -X POST http://localhost:8080/api/v1/execute/async/swe-planner.build \
-H "Content-Type: application/json" \
-d @- <<'JSON'
{
"input": {
"goal": "Add JWT auth",
"repo_url": "https://github.com/user/my-repo"
}
}
JSON
# With open-source runtime (set OPENROUTER_API_KEY in .env)
curl -X POST http://localhost:8080/api/v1/execute/async/swe-planner.build \
-H "Content-Type: application/json" \
-d @- <<'JSON'
{
"input": {
"goal": "Add JWT auth",
"repo_url": "https://github.com/user/my-repo",
"config": {
"runtime": "open_code",
"models": {
"default": "openrouter/minimax/minimax-m2.5"
}
}
}
}
JSON
# Local workspace mode (repo_path)
curl -X POST http://localhost:8080/api/v1/execute/async/swe-planner.build \
-H "Content-Type: application/json" \
-d @- <<'JSON'
{
"input": {
"goal": "Add JWT auth",
"repo_path": "/workspaces/my-repo"
}
}
JSONScale workers:
docker compose up --scale swe-agent=3 -dUse a host control plane instead of Docker control-plane service:
docker compose -f docker-compose.local.yml up -dPass repo_url instead of repo_path to let SWE-AF clone and open a PR after execution.
curl -X POST http://localhost:8080/api/v1/execute/async/swe-planner.build \
-H "Content-Type: application/json" \
-d @- <<'JSON'
{
"input": {
"repo_url": "https://github.com/user/my-project",
"goal": "Add comprehensive test coverage",
"config": {
"runtime": "claude_code",
"models": {
"default": "sonnet",
"coder": "opus",
"qa": "opus"
}
}
}
}
JSONRequirements:
GH_TOKENin.envwithreposcope- Repo access for that token
After SWE-AF pushes the integration branch and opens a PR (ready for review,
not draft), it watches GitHub Actions on that PR until checks are
conclusive. If they fail, a bounded fix-and-repush loop runs an agent that
is explicitly forbidden from silencing tests (no pytest.skip, no xfail,
no commenting tests out, no loosening assertions) — it must produce a
legitimate fix in the production code and push a new commit. When CI is
green, the gate returns success; when CI fails after fix attempts, the PR
stays open with visible failing checks so a human reviewer can step in.
Configuration on BuildConfig:
| Field | Default | Purpose |
|---|---|---|
check_ci |
true |
Run the post-PR CI gate. Set false to return immediately after the PR is created. |
max_ci_fix_cycles |
2 |
Cap on watch → fix → repush iterations after the initial push. |
ci_wait_seconds |
1500 |
Wall-clock cap per gh pr checks watch (25 min). |
ci_poll_seconds |
30 |
Poll interval for gh pr checks. |
Agent endpoints
Core async endpoints (returns an execution_id immediately):
# Full build: plan -> execute -> verify
POST /api/v1/execute/async/swe-planner.build
# Plan only
POST /api/v1/execute/async/swe-planner.plan
# Execute a prebuilt plan
POST /api/v1/execute/async/swe-planner.execute
# Resume after interruption
POST /api/v1/execute/async/swe-planner.resume_buildMonitoring:
curl http://localhost:8080/api/v1/executions/<execution_id>Every specialist is also callable directly:
POST /api/v1/execute/async/swe-planner.<agent>
Agent execution flow
| Agent | In -> Out |
|---|---|
run_product_manager |
goal -> PRD |
run_architect |
PRD -> architecture |
run_tech_lead |
architecture -> review |
run_sprint_planner |
architecture -> issue DAG |
run_issue_writer |
issue spec -> detailed issue |
run_coder |
issue + worktree -> code + tests + commit |
run_qa |
worktree -> test results |
run_code_reviewer |
worktree -> quality/security review |
run_qa_synthesizer |
QA + review -> FIX / APPROVE / BLOCK |
run_issue_advisor |
failure context -> adapt / split / accept / escalate |
run_replanner |
build state + failures -> restructured plan |
run_merger |
branches -> merged output |
run_integration_tester |
merged repo -> integration results |
run_verifier |
repo + PRD -> acceptance pass/fail |
generate_fix_issues |
failed criteria -> targeted fix issues |
run_github_pr |
branch -> push + PR |
Configuration
Pass config to build or execute. Full schema: swe_af/execution/schemas.py
| Key | Default | Description |
|---|---|---|
runtime |
"claude_code" |
Model runtime: "claude_code", "open_code", or "codex". The default also honors the SWE_DEFAULT_RUNTIME env var when no runtime is passed in config — set it on the deployment so callers don't need to plumb a config through. |
models |
null |
Flat role-model map (default + role keys below). Without a caller-supplied value, the SWE_DEFAULT_MODEL env var is used as the default for all roles — set it on the deployment to pin a model without code changes. Caller models.default or per-role keys still win. |
max_coding_iterations |
5 |
Inner-loop retry budget |
max_advisor_invocations |
2 |
Middle-loop advisor budget |
max_replans |
2 |
Build-level replanning budget |
enable_issue_advisor |
true |
Enable issue adaptation |
enable_replanning |
true |
Enable global replanning |
enable_learning |
false |
Enable cross-issue shared memory (continual learning) |
agent_timeout_seconds |
2700 |
Per-agent timeout |
agent_max_turns |
150 |
Tool-use turn budget |
Model Role Keys
models supports:
defaultpm,architect,tech_lead,sprint_plannercoder,qa,code_reviewer,qa_synthesizerreplan,retry_advisor,issue_writer,issue_advisorverifier,git,merger,integration_tester
Resolution order
runtime defaults < models.default < models.<role>
Config examples
Minimal:
{
"runtime": "claude_code"
}Codex:
{
"runtime": "codex",
"models": {
"default": "gpt-5.3-codex"
}
}Fully customized:
{
"runtime": "open_code",
"models": {
"default": "openrouter/minimax/minimax-m2.5",
"pm": "openrouter/qwen/qwen-2.5-72b-instruct",
"architect": "openrouter/qwen/qwen-2.5-72b-instruct",
"coder": "openrouter/deepseek/deepseek-chat",
"qa": "openrouter/deepseek/deepseek-chat",
"verifier": "openrouter/qwen/qwen-2.5-72b-instruct"
},
"max_coding_iterations": 6,
"enable_learning": true
}Artifacts
.artifacts/
├── plan/ # PRD, architecture, issue specs
├── execution/ # checkpoints, per-issue logs, agent outputs
└── verification/ # acceptance criteria results
Development
make test
make check
make clean
make clean-examplesSecurity and Community
- Contribution guide:
docs/CONTRIBUTING.md - Code of conduct:
CODE_OF_CONDUCT.md - Security policy:
SECURITY.md - Changelog:
CHANGELOG.md - License:
Apache-2.0
SEC-AF — AI-native security auditor. 250 agents per audit, 94% noise reduction, every finding proven exploitable.
Contract-AF — Legal contract risk analyzer. Agents spawn agents at runtime. Adversarial review catches what solo LLMs miss.
SWE-AF is built on AgentField as a first step from single-agent harnesses to autonomous software engineering factories. See what else we're building →

