GitHub - RNT56/NilCore: Minimal Agentic loop.

The tiny, trustworthy coding agent.

The harness is small. The model is the engine. NilCore borrows intelligence instead of re‑encoding it — so the whole agent is ~75,000 lines of Go (a ~8k single‑task core you can read in an afternoon; everything else is opt‑in layers over it): the single‑task loop, an opt‑in multi‑agent supervisor that builds whole projects, a verified swarm that fans hundreds of agents at a problem, and a recursive decompose that splits a goal and merges the verified pieces back — all collapsed onto one orchestration kernel, so you don't pick a machine: you just talk, and nilcore routes the goal to the cheapest one that fits. It treats code as one verifiable artifact among many — reports, comparison matrices, audits, benchmarks, research dossiers — each carrying claims a verifier re‑checks in the sandbox. It can see the running app through a sandboxed browser — even driving a flow (log in, submit a form) before it observes — search code semantically, read 19 languages (Go · Python · TS/JS · Rust · Java · C/C++ · C# · Ruby · Kotlin · Swift · …), and start work from a webhook or a schedule. And it closes the loop on its own evidence — learning from its verified-or-failed trace which backend to trust, what to recheck, and (opt‑in, fenced, never on main) what it may auto‑approve. Hardened by three disciplines and seven invariants it never breaks.

TL;DR — Point NilCore at a repo and a goal. It works in a throwaway git worktree, runs every command the model emits inside a sandbox, and isn't done until your checks pass — not until the model says it's done. Drive it from your terminal or your phone. It never holds your keys, never lets the model run an arbitrary program on the host, and never decides "done" on its own word.

nilcore                       # just start talking — it picks the machine and works while you type
nilcore -goal "make the failing test in math_test.go pass"   # or drive one task headless

Why another coding agent?

Because most of them ask you to trust a black box. NilCore is built on the opposite bet: trust comes from verification, sandboxing, and a trace you can read — not from a bigger model. Here's the pain, and how NilCore kills it:

The pain you've felt	How NilCore solves it
"It said it was done. It wasn't."	The verifier is the only authority on done. After any backend runs, your project's own build/test/lint re‑runs and that verdict ships the work — a self‑report never does.
*"Tests pass, but does the app actually work?"*	NilCore can see the running app. A sandboxed headless browser (`browser_view` + a pure‑Go `nilcore-browser` driver baked into the image) navigates your app — and, given an `actions` script, first drives a flow (click / type / key / wait, e.g. log in or submit a form) over a pure‑Go CDP client — then hands the model a screenshot as a multimodal image; opt‑in via `NILCORE_BROWSER_VERIFY`, a composite verifier folds that behavioral check into the verdict, so the verifier stays the sole authority on done. (The live browser run is CI‑only — no Chromium in hermetic unit tests — and the driver fails closed* without a browser.)*
"It ran a destructive command / touched my host."	Every command the model emits runs in a container (rootless, `cap-drop=ALL`, read‑only rootfs), destructive ones denylisted before execution. The model can't run an arbitrary program on your machine; its file edits are confined to a throwaway worktree.
"It leaked my API key."	Secrets come from the environment only, are injected per‑run into the container, and are never written to disk, put in a prompt, or logged — the audit log is hash‑chained and redacted.
"A fetched file/web page hijacked it."	Untrusted input is data, never instructions. Tool output, files, and web content are fenced behind a boundary the model is told not to obey.
"It edited blindly without understanding my codebase."	A real code‑intelligence stack — AST → call graph → PageRank repo‑map → semantic + LSP retrieval — hands the loop a minimal, structurally‑coherent context bundle before it touches a file. It reads 19 languages across 34 file extensions — Go · Python · TS/JS · Rust · Java · C · C++ · C# · Ruby · PHP · Kotlin · Swift · Scala · Dart · Zig · Bash · Lua · Elixir · SQL (a pure‑Go parser seam — Go is precise via `go/parser`, the rest are broad, structural heuristic line scanners, no tree‑sitter; the `NILCORE_LSP_COMMAND` seam stays the precise lens where a server exists), and semantic search runs on a content‑hash‑cached, pure‑Go HNSW vector index — opt‑in via `NILCORE_EMBED_KEY`, with a lexical fallback that's byte‑identical when it's off.
*"It can only fix one task, not build the thing."*	`nilcore build` is a supervisor that spawns role‑specialized subagents (research · understand · plan · implement · review), lets them talk back and forth, integrates their parallel worktrees into one verifier‑green tree, and re‑plans to convergence — greenfield included. It still writes code itself.
"I have to babysit it / can't course‑correct mid‑run."	Just talk to it. `nilcore chat` is one conversation — a model classifier (not a word‑count heuristic) sizes your message by the work and picks the cheapest honest route: a single native loop, the supervised fan‑out, or a whole project. While it works its reasoning streams live, token by token, and you can queue a follow‑up (folds in at the next step), steer — `!…` interrupts mid‑thought but keeps what it's reasoned so far, folds your feedback in, and resumes or changes course — or `/cancel` to abort the run outright while staying in the conversation.
"It went rogue while I was away."	Bounded autonomy: reversible work runs unattended; irreversible actions (merge, push, deploy, pay) hit a human gate — which becomes a Yes/No tap in Telegram or Slack.
*"I want it to react* — not just sit there waiting for me."**	Event‑driven & scheduled autonomy. `nilcore serve --webhook` turns an HMAC‑verified SCM/CI webhook into a trigger; `nilcore schedule` self‑starts on a cron/interval. Both route through the same reversible‑auto‑start / human‑gate machinery — headless means irreversible work deny‑defaults.
"Opening the PR is the part I don't trust it with."	Gated PR. `nilcore watch --open-pr` / `schedule --open-pr` open a draft PR (via `internal/forge`) only after the human gate — the push runs inside the approved prepare step, the token comes from the SecretStore and is scrubbed from logs, and the agent never merges. The verified branch is preserved; default disposable cleanup is byte‑identical.
"I can't give it project‑specific marching orders."	Operator steering. Drop a `NILCORE.md` / `AGENTS.md` and it loads as trusted instructions — the one deliberate, scoped exception to "untrusted input is data," bounded below the safety core: it can shape behavior but can't widen capability or bypass the gate or verifier. Wired into chat and run/build.
"I'm locked into one model vendor."	One `Provider` seam, three adapters: Anthropic, OpenAI, OpenRouter. Model selection is `role → provider:model`. The cheap executor escalates to a strong advisor on demand. And one `CodingBackend` seam, three backends: the native loop, Codex, Claude Code — and you don't have to pick: `-backend auto` lets the system choose the best available backend (the ones whose CLI + key are actually present on the host), seeded by your stated preference (`-prefer-backend` / `preferred_backend`) and re‑ordered as the verifier‑judged Trust Ledger learns which one wins on your codebase. Or `-backends auto` competes all available backends — racing them on a hard task and letting the verifier pick the winner (`nilcore trust` shows the scoreboard). No more hard‑defaulting to native as if it were best.
"It forgets everything between tasks."	Cross‑project memory (SQLite): conventions and decisions are retrieved into context at task start and written back after — deduped, never as instructions.
"The framework is too big to trust."	The entire agent is ~75,000 lines of Go with two core dependencies — pure‑Go SQLite, and `golang.org/x/sys` (Go's own extended stdlib) for the Linux namespace sandbox — built up from a ~8k single‑task core you can read in an afternoon, with the multi‑agent layer, swarm, browser/desktop, code‑intel, closed‑loop autonomy, and the conversational front door as opt‑in layers over it (one orchestration kernel they all collapse onto). Still exactly two: the browser driver (incl. its pure‑Go CDP/WebSocket client), the multi‑language parser backends, embedder, and forge are all pure stdlib — no module was added. If you can't read it end to end, it's too big. (The optional full‑screen TUI — `make tui` — links the Charm stack under a build tag, so the default binary doesn't and `internal/` never imports it.)

The core loop

Everything orbits one loop. The verifier — your checks — is the source of truth.

flowchart LR
    G([Goal]) --> CTX[gather context<br/>+ memory + code-intel]
    CTX --> M[model picks a tool]
    M --> S[execute in sandbox]
    S --> O[observe]
    O --> V{VERIFY<br/>build · test · lint}
    V -- red --> M
    V -- green --> GATE{irreversible?}
    GATE -- reversible --> SHIP([shipped])
    GATE -- merge / push / deploy --> HUMAN[human gate<br/>console / chat]
    HUMAN --> SHIP

Whatever writes the diff — NilCore's own loop, Codex, or Claude Code — your checks decide whether it ships. That single rule is what makes delegating to black‑box agents safe.

What you get

▸ Hybrid backends, one contract Native loop + delegate to Codex / Claude Code. Add one without touching the core.

▸ Hardened sandbox Rootless containers, dropped caps, read‑only rootfs, default‑deny egress with an allowlist proxy.

▸ Secrets that never leak Keychain / encrypted‑file vault / env / external hook. The model never sees a key.

▸ Drive it from your phone serve on a VPS; Telegram & Slack. Gates become inline Yes/No.

▸ One conversational front door (nilcore chat) Just talk — it infers quick‑fix vs feature vs whole‑project and acts. Watch its reasoning stream live; queue a follow‑up, steer (!…) — it interrupts mid‑thought, keeps the partial reasoning, then resumes or changes — or /cancel to abort. Works in the terminal and over Telegram/Slack.

▸ One engine — you don't pick a machine (nilcore do) run / build / swarm / decompose collapse onto one recursive orchestration kernel; a goal→preset router picks the cheapest one that fits and dispatches. Not five products — one agent that chooses how hard to work. decompose splits a goal, runs each piece, and merges the verified branches into one re‑verified tip.

▸ Verifier‑backed artifacts, not just code Code is one artifact type among many — reports, comparison matrices, audits, benchmarks, research dossiers — each a typed Artifact whose every Claim carries Evidence{value, source_url, verifier, status}. A worker's self‑written pass is overwritten by a real check run in the sandbox; an unregistered verifier ⇒ unverifiable, never green. Granular requeue re‑runs exactly the failed claims, not the world. nilcore report replays the log into the trust story and refuses green over a broken chain.

▸ Code intelligence (19 languages — Go · Python · TS/JS · Rust · Java · C/C++ · C# · Ruby · Kotlin · Swift · …; heuristic scanners, LSP = the precise lens) AST · call graph · PageRank repo‑map · LSP · pure‑Go HNSW semantic search · Impact Set + SBFL · live worktree‑aware updates.

▸ It can see the running app A sandboxed headless browser (browser_view) can drive a flow first (click / type / key / wait — log in, submit a form) over a pure‑Go CDP client, then hands the model a screenshot as a multimodal image; opt‑in, a composite verifier folds the behavioral check into the verdict. (Live run is CI‑only; fails closed without a browser.)

▸ Multi‑agent supervisor (nilcore build) A supervisor spawns role‑specialized subagents (research · understand · plan · implement · review) that communicate back and forth, integrates their parallel worktrees into one verifier‑green tree, and re‑plans to convergence. Greenfield‑capable; the supervisor codes, too.

▸ Tamper‑evident audit Append‑only, hash‑chained, secret‑redacted event log. Replay any run.

▸ Runs unattended — and reacts Provider retry/failover, cost ceilings, durable resume on restart, resource GC, health checks. Plus event/scheduled triggers — serve --webhook (HMAC‑verified SCM/CI) and schedule (cron/interval) — and a gated draft PR (--open-pr) that opens only after the human gate. The agent never merges.

▸ Closed‑loop autonomy (it learns from its own evidence) NilCore consumes its verifier‑judged trace to get better: a Trust Ledger routes to the backend that actually wins on your code, distilled lessons + a content‑hash verify‑cache stop it repeating scars, a human‑gated flywheel proposes its own improvements, and graduated auto‑approval earns wider unattended scope — fenced by a four‑axis blast‑budget and never on main/prod. All opt‑in; nilcore experience / trust / lessons / auto-approvals show the receipts.

▸ Verified swarm mode (nilcore swarm) Fan N units of work into a bounded in‑process pool on one host — --agents 300 --concurrency 40 — where every unit produces a typed artifact judged by a verify‑pack and only verifier‑green shards ship; failed shards requeue until clean (or a budget/pass limit). Five presets (research · code · audit · benchmark · ui), a tiered provider pool (strong planner/verifier + cheap worker tier + fallback + per‑provider caps), and a live scoreboard (checked/passed/failed/retry‑pass/remaining + cost/time/token + source‑claim trace). Massive fan‑out, verifier‑owned quality — it refuses to ship anything it can't verify.

▸ Operator steering A NILCORE.md / AGENTS.md steering file loads as trusted project instructions — scoped below the safety core, so it can shape behavior but never widen capability or bypass the gate/verifier.

Quickstart

Requires Go 1.25+. On Linux with a Landlock‑capable kernel (5.13+) and unprivileged user namespaces, NilCore sandboxes the loop with no container runtime at all — the auto‑detected host‑native namespace backend. Otherwise (or with -sandbox container) it uses a container runtime (podman rootless preferred, or docker).

# Install (or grab a binary from Releases)
curl -fsSL https://raw.githubusercontent.com/RNT56/NilCore/main/scripts/install.sh | sh

# 1) Guided setup — one pass: providers + keys (→ SecretStore), runtime, backend,
#    chat channel + serve allowlist. Re-check readiness anytime with `nilcore doctor`.
nilcore init

# 2) Just talk to it — the conversational front door. It infers whether your message
#    is a quick fix, a feature, or a whole project and pulls the strings itself; it
#    works while you type, so you can QUEUE a follow-up or STEER (!...) to interrupt
#    its current step. This is the usual way to drive NilCore.
nilcore                                   # same as: nilcore chat -dir .

# One-shot, but let the agent pick HOW to work: `do` routes the goal to the cheapest
#   preset that fits — run (a task), build (a project), swarm (breadth), or decompose
#   (split + merge) — then dispatches to that proven machine. -dry-run previews the route.
nilcore do -goal "add a login form and wire the logout button"   # try -dry-run first

# — or drive a specific mode directly (also what the conversation / `do` routes to) —

# Run one task to completion (the native loop, in a disposable worktree).
#   Add -auto-supervise to let the model classifier scale a complex goal UP to the
#   supervised project loop (same caps as `nilcore build`); off => single-task.
nilcore -dir ./repo \
        -goal "fix the failing test in math_test.go" \
        -verify "go build ./... && go test ./..."

# Build a WHOLE project from one prompt — a supervisor spawns role-specialized
#   subagents that talk to each other, integrates their parallel work into one
#   verifier-green tree, and re-plans to convergence. Greenfield (-new) or -dir.
nilcore build -goal "Go HTTP service: /health 200 + /orders POST persists to SQLite" -new ./svc

# Delegate a single task to Claude Code or Codex — verified the same way.
#   Model / effort / extra args / env are configurable (via `nilcore init`, or
#   NILCORE_CLAUDE_MODEL/_EFFORT · NILCORE_CODEX_MODEL/_EFFORT); unset => CLI default.
#   Or let the system pick: -backend auto chooses the best AVAILABLE backend
#   (seeded by -prefer-backend, learned by the Trust Ledger); -backends auto races them all.
nilcore -dir ./repo -goal "..." -backend claude-code

# Drive it from your phone: serve gives Telegram/Slack the same conversation —
#   queue + steer + auto-routing; gates become inline Yes/No replies.
nilcore serve -channel telegram          # needs a channel + allowlist (from `nilcore init`)

# React to events instead of waiting: turn an HMAC-verified SCM/CI webhook into a
#   trigger, or self-start on a cron/interval. Both route through the same
#   reversible-auto-start / human-gate machinery (headless => irreversible work deny-defaults).
nilcore serve --webhook :8080            # needs NILCORE_WEBHOOK_SECRET (HMAC); NILCORE_WEBHOOK_LABEL optional
nilcore schedule --every 1h --goal "..." # or a cron expr; add --open-pr to open a GATED draft PR

# Let it see the running app: an opt-in composite verifier folds a sandboxed
#   headless-browser behavioral check into the verdict (CI-only live run; fails closed).
NILCORE_BROWSER_VERIFY=1 nilcore -dir ./svc -goal "..."

# Fan out a VERIFIED swarm: N shards in a bounded in-process pool, each producing a
#   TYPED artifact judged by a verify-pack. Only verifier-green shards ship; failed
#   shards requeue until clean (or the budget/pass limit). 300 agents are fine because
#   every unit is checkable — no majority vote, no "the model says it looks right".
nilcore swarm -goal "research 100 EV companies" -preset research \
  -agents 300 -concurrency 40 -artifact report+matrix -verify-pack finance \
  -passes until-clean -budget 500
#   Presets: research | code | audit | benchmark | ui. The live scoreboard shows
#   checked/passed/failed/retry-pass/remaining + cost/time/token + the source–claim
#   trace; replay it anytime with `nilcore report -format matrix -dir ./repo`.
#   In-process / single-host / bounded; default-off (the binary is byte-identical unused).

# Prefer env vars / CI? Skip the wizard and export keys directly:
#   export ANTHROPIC_API_KEY=sk-...   (or NILCORE_* for scripted: nilcore init -non-interactive)
#   NILCORE_EMBED_KEY enables pure-Go HNSW semantic search; a NILCORE.md / AGENTS.md
#   steering file (trusted, scoped below the safety core) gives the agent project marching orders.

Other commands

nilcore help lists them all. Each is one focused verb over the same audited core:

Command	What it does
`nilcore do -goal …`	The agent picks how to work. Routes the goal to the cheapest preset that fits — `run` / `build` / `swarm` / `decompose` — and dispatches to that proven machine. `-dry-run` previews the route, `-as <preset>` forces one. The realization of "the conversation picks an envelope, not a machine."
`nilcore decompose -goal "<a> and <b>"`	The kernel's recursive decompose preset: split a goal into independent sub-goals, run each as a full verified task, then merge the verified branches into one re-verified tip — re-verifying after every merge and dropping any piece that conflicts or turns the tree red (the verifier owns "done", not the pieces). Opt-in.
`nilcore flows validate\|run -flow f.json`	Consume a portable agentic-flows workflow. `validate` is a preflight gate (does NilCore support the flow's cores + capabilities?, no execution); `run` executes its `agent_task` nodes through the verified decompose preset. NilCore is the sandboxed-worker consumer of that shared contract — see `docs/AGENTIC-FLOWS.md`.
`nilcore doctor`	Host-readiness gate — keys resolve, runtime on PATH, serve allowlist sane. Exits non-zero when not ready, so it doubles as a CI health check.
`nilcore inspect [health]`	Replays the append-only event log into a summary (events by kind, tasks, chain verified); `health` probes it as a liveness gate.
`nilcore trace <task>` (alias `why`)	Reconstructs the causal "why did it do that" tree from the log — read-only, metadata-only; marks the trace untrusted over a broken hash chain.
`nilcore trust`	The Trust Ledger scoreboard — each backend's verifier-judged race pass-rate (plus per-model pass-rate/cost from a folded eval report). Strength is earned from evidence, never asserted. With `-backends`, it drives live routing: the strongest is tried first; a verify-fail races them all and the verifier picks the winner (never the ledger).
`nilcore experience` · `capability`	The closed-loop scoreboard — the experience projection derived over the log (what's been tried, what passed), and the exact "what may this drive do" capability descriptor. Read-only. `experience -warm` reads the warm store-backed projection (no full log replay); `-rebuild` re-derives it from the log.
`nilcore lessons`	The recurring verifier-failure patterns the agent distilled from its own trace (opt-in, auto-folded into memory) — so it stops repeating its scars.
`nilcore flywheel [--once]`	The self-improvement flywheel — eval → mine failures → propose a fix. Verified and human-gated; it never edits the verifier of record. Auto-merge is a separate double opt-in.
`nilcore objective` · `auto-approvals`	The operator-only standing-objectives backlog the autonomy daemon draws from · the account of past graduated auto-approvals + the per-class undo story (every auto-approval is fenced by a blast-budget and never fires on `main`/prod).
`nilcore watch`	Self-starts tasks from dropped signal files — reversible work auto-runs, anything irreversible routes to the human gate (`--open-pr` opens a gated draft PR once approved).
`nilcore schedule`	Same as `watch`, but self-starts on a cron/interval (same `--open-pr` gate).
`nilcore browse -goal …`	Drives a persistent, in-sandbox browser (observe → plan → act → verify); recorded findings are re-verified in-box before they ship.
`nilcore desktop -goal …`	Drives a contained virtual desktop via the Set-of-Marks ladder; `--mac-host` (doubly gated) drives a real Mac.
`nilcore registry list\|install <manifest.json>`	Manages versioned local skills + MCP server specs (remote fetch stays gated as external infra).
`nilcore propose-edit -goal … -paths …`	The gated self-edit flow — the agent may change its own prompts/skills/tools, never the core or contracts (scope-checked, verified, human-gated).
`nilcore config show`	Prints the active, secret-free config.
`nilcore secret set <name>`	Stores or rotates one credential (into the SecretStore — never disk/log/prompt).
`nilcore version`	Reports the build.

Capability plug-ins

All opt-in — the default binary stays dependency-light and the loop is byte-identical when they're absent:

Plug-in	Turn it on with	What you get
Skills	A `SKILL.md` (frontmatter + instructions) in `~/.config/nilcore/skills/` (or `$NILCORE_SKILLS_DIR`)	Surfaces to the loop as a `skill_<name>` tool; unused skills cost ~zero context.
MCP servers	`{name, command}` (stdio) or `{name, url, headers}` (remote HTTP/SSE) entries in `mcp.json`	`nilcore` generates typed wrappers under `mcp/servers/`; the executor discovers them on demand and invokes the host-dispatched `mcp` tool — so MCP works on every sandbox tier, including the macOS container default. Resources + prompts are opt-in (`NILCORE_MCP_RESOURCES=1`).
LSP retrieval	`NILCORE_LSP_COMMAND=gopls` (or any language server)	Compiler-grade "precise" retrieval.
Live index	`NILCORE_LIVE_INDEX=1`	A worktree-aware, incrementally-updated `live` code-intelligence tool.

Model selection

Set NILCORE_MODEL=provider:model (default claude-sonnet-4-6):

Bare name → Anthropic — e.g. claude-sonnet-4-6.
Other providers — openai:gpt-5.5, openrouter:meta-llama/llama-3.1-70b.
OpenRouter fusion — openrouter or openrouter: with no model defaults to openrouter/fusion, a multi-model panel that fuses several frontier models into one answer (it bills the panel's cumulative cost).

Every step is appended to a hash-chained nilcore.events.jsonl — read it to see exactly what the agent did and why. Plaintext secrets never hit disk, logs, or prompts; on a headless host they are sealed in an encrypted-file vault (AES-256-GCM, owner-only key).

Our dogma — first principles, ranked by leverage

By 2026 the frontier models inside every serious agent have converged. The harness does the rest. NilCore's bet is to be the best harness — and "best" is the disciplined application of a short list, not a long list of features.

The feedback loop is the product. Knowing — truthfully, fast — whether the code works is everything. Verification is the sole authority on done.
The harness wins; borrow the intelligence. Keep the harness small, sharp, and yours; let the model supply the fluency.
Context is the scarce resource — engineer it ruthlessly. The right context beats the biggest window. Retrieve precisely, prune aggressively, summarize on handoff.
Understand before you change. Navigate symbols, references, and a repo‑map first. Earn the right to edit.
Small, reversible, verified steps. One change → verify → checkpoint. Reversible by construction, so the gate concentrates only where reversibility ends.
Define "done" before you start. Acceptance criteria — ideally a failing test — first. The best defense against confidently building the wrong thing.
Quality is the bar, not correctness. Green is the floor. A minimal, idiomatic diff a senior would approve is the bar.
Recover, don't thrash. Recognize being stuck and change strategy — escalate to the advisor, or stop and ask one sharp question.
Earn improvement from evidence. Tune from evals and the audit trail, not vibes.
Safety is what makes autonomy possible. The sandbox, the gate, the audit, and no ambient authority aren't friction — they're why the agent can be trusted to run unattended.

Anti‑principles we refuse: reaching for a bigger model instead of a better harness · stuffing the context window "to be safe" · heroic one‑shot rewrites · trusting "it works" over a check · editing before understanding · optimizing on vibes · bolting on features that dilute the core.

The seven invariants (non‑negotiable)

These hold in every commit. Break one and the change is rejected — no matter how good the rest is.

One frozen backend contract — Run(ctx, Task) (Result, error). Native, Codex, Claude Code are interchangeable behind it.
The verifier is the only authority on "done." A self‑report never governs.
No ambient authority. Secrets via env only; never on disk, in logs, in prompts, or in code.
Model-emitted execution is sandboxed. Shell commands and delegated CLIs run in the container; the structured file/git tools run host-side but stay confined to the worktree — the model can't run an arbitrary program on the host.
The audit log is append‑only — hash‑chained, redacted, replayable. History is never mutated.
Zero‑dependency core — standard library only; the sanctioned exceptions are pure‑Go SQLite, golang.org/x/sys (Go's own extended stdlib, for the Linux namespace sandbox), and the Charm TUI stack (behind //go:build tui, so the default binary links none). The MCP client is not a module — it's JSON‑RPC over the stdlib.
Untrusted input is data, never instructions.

Architecture at a glance

flowchart TD
    CLI[cmd/nilcore<br/>chat · do · run · build · swarm · decompose · serve · report · schedule · doctor] --> ROUTER[router<br/>do: goal → preset]
    ROUTER --> KERNEL[kernel<br/>one recursive Run · run/build/swarm/decompose presets]
    CLI --> KERNEL
    KERNEL --> AGENT[agent<br/>orchestrator + adaptive routing]
    KERNEL --> SWARM[swarm<br/>bounded in-process pool · typed artifacts · requeue-until-clean]
    CLI --> STEER[steering<br/>trusted NILCORE.md / AGENTS.md]
    STEER --> AGENT
    XP[experience · trust · lessons · flywheel<br/>closed loop over the verified trace] --> AGENT
    SWARM --> POOL[pool<br/>strong planner/verifier · cheap workers · fallback · caps]
    SWARM --> ARTIFACT[artifact + evverify + packs<br/>typed claims · verifier-produced green]
    ARTIFACT --> VERIFY
    AGENT --> BK[backend<br/>native · codex · claude-code]
    AGENT --> WT[worktree<br/>disposable per task]
    BK --> MODEL[model + provider<br/>Anthropic · OpenAI · OpenRouter · multimodal]
    BK --> SANDBOX[sandbox<br/>hardened container + nilcore-browser]
    BK --> VERIFY[verify<br/>source of truth + browser behavioral check]
    AGENT --> POLICY[policy<br/>gate · egress · tool-call]
    AGENT --> LOG[eventlog<br/>hash-chained + store]
    AGENT --> CI[codeintel<br/>ast 19 languages to graph to repomap to HNSW retrieve]
    AGENT --> MEM[memory<br/>cross-project SQLite]
    CLI --> CHAN[channel<br/>telegram · slack]
    CLI --> TRIG[scmhook · cron<br/>webhook / scheduled triggers]
    TRIG --> AGENT
    AGENT --> FORGE[forge<br/>gated draft PR]

Dependencies point inward; leaf packages never import the orchestrator. The full design and rationale live in docs/ARCHITECTURE.md and docs/PRINCIPLES.md. For one end-to-end map of the whole system — chat behaviour, every command, the engine, and the safety core, with a front-door index to all the in-depth docs — see docs/REFERENCE.md.

The receipts


~75,000	lines of Go — the agent itself (~8k single‑task core · multi‑agent supervisor · conversational front door · verified swarm · recursive decompose · closed‑loop autonomy — all on one orchestration kernel)
~142,300	lines including its tests (347 test files)
122	small, single‑responsibility packages
2	core deps in the default binary — pure‑Go SQLite · `golang.org/x/sys` (Go's extended stdlib); the Charm TUI's 3 modules link only under `make tui`. The browser driver (incl. a pure‑Go CDP/WebSocket client), the multi‑language parser backends, embedder, forge, the provider pool, the swarm runner, and the orchestration kernel + router are all pure stdlib — no module added
7 / 7	invariants held
Phases 0–16	shipped — incl. the unified orchestration kernel (`run`/`build`/`swarm`/`decompose` collapse onto one recursive engine; `nilcore do` routes the goal), closed‑loop autonomy (trust‑routing, learned lessons + verify‑cache, a verified self‑improvement flywheel, and graduated auto‑approval fenced by a blast‑budget — opt‑in, never on `main`), the verifier‑backed artifact factory, and verified swarm mode, atop behavioral browser verification, semantic (HNSW) + multi‑language (19 languages / 34 extensions) code intel, event/scheduled triggers, gated draft PRs, and trusted operator steering

What's inside

cmd/nilcore/           chat · do · run · build · swarm · decompose · tui · init · serve · schedule · watch · browse · desktop · report · trust · trace · experience · capability · lessons · flywheel · objective · auto-approvals · inspect · registry · propose-edit · mcp-call · doctor · config · secret · version
cmd/tools/nilcore-browser   pure-Go headless-browser driver baked into the sandbox image
internal/
  model, provider      canonical message format (+ multimodal image block) + Anthropic/OpenAI/OpenRouter
  backend              CodingBackend contract + native / codex / claude-code
  sandbox              hardened container executor
  verify               the source of truth for "done" (+ auto-detection · opt-in browser behavioral check)
  eventlog             append-only, hash-chained, redacted audit trail
  policy               reversibility gate · egress allowlist · tool-call denylist
  agent                orchestrator · routing · spawn (DAG) · durability · bus (inter-agent)
  kernel, router       unified orchestration kernel (one recursive Run; run/build/swarm/decompose presets, MaxChildren/Observer-bounded) · goal→preset router (the `nilcore do` brain)
  super, project       multi-agent supervisor · autonomous project loop + greenfield bootstrap
  session, inbox       conversational front door · queue/steer user-message seam
  emit, loopctl        live reasoning sink · steer-vs-shutdown cancel discriminator
  roster, integrate    role-specialized subagents · parallel-worktree merge + verify-each
  artifact, evverify   typed evidence artifacts (Claim/Evidence/Status) · verifier-produced green (the artifact factory)
  artifact/{packs,schema}  verify-packs: web·software·finance·ui·audit·benchmark·code + structural schema (curl-in-box, no SDK)
  requeue, report      field-granular requeue (only the failed claims) · verification report + matrix replay over the log
  pool, swarm          tiered provider pool (planner/verifier·workers·fallback·caps) · verified swarm: shard queue·runner·until-clean controller·scoreboard
  worktreefs, browserwire   symlink-safe worktree FS confinement (O_NOFOLLOW) · shared shell-quote + browser-observation contract
  steering             trusted NILCORE.md / AGENTS.md operator instructions (scoped below the safety core)
  scmhook, cron        HMAC-verified webhook triggers · cron/interval self-start
  forge                gated draft-PR opener (token from SecretStore; never merges)
  meter                token/dollar metering → the budget ceiling is a hard wall
  worktree             disposable git worktree per task
  channel              Channel contract · telegram · slack · authorized control
  tools, mcp           structured tools (+ browser_view) + MCP-as-code
  embed                opt-in OpenAI-compatible embedder (NILCORE_EMBED_KEY)
  codeintel/*          ast (19 languages / 34 exts — Go · Python · TS/JS · Rust · Java · C/C++ · C# · Ruby · …) · graph · repomap · lsp · semantic (HNSW) · retrieve · impact · live
  store, memory        SQLite backbone + cross-project memory
  experience, capability   derived experience projection over the log · the "what may this drive do" descriptor
  trust, vcache, lessons   the Trust Ledger (verifier-earned routing) · content-hash verify cache · learned verifier-failure lessons
  graapprove, blastbudget  graduated auto-approval (earned trust + operator envelope; never main/prod) · four-axis runtime blast fence
  flywheel, autosrc, objective   verified self-improvement flywheel (human-gated) · autonomy daemon · standing-objectives backlog
  secrets              keychain / encrypted vault / env / external
  skills, selfimprove  Agent Skills + plugins + gated self-edit
  registry             versioned local skills + MCP server specs (install / list)
  budget, scheduler, maint, inspect   runtime resilience & ops
  onboard, paths       `nilcore init` wizard + versioned config + per-OS dirs
eval/                  measure-first eval harness

No ambient authority. One loop, fully observable. You can always read the trace and pull the plug.

Borrow intelligence — don't reimplement it.

Name		Name	Last commit message	Last commit date
Latest commit History 244 Commits
.github/workflows		.github/workflows
assets		assets
cmd		cmd
docs		docs
eval		eval
images		images
internal		internal
scripts		scripts
test		test
.gitignore		.gitignore
.golangci.yml		.golangci.yml
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
Makefile		Makefile
README.md		README.md
STATE.md		STATE.md
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The tiny, trustworthy coding agent.

Why another coding agent?

The core loop

What you get

Quickstart

Other commands

Capability plug-ins

Model selection

Our dogma — first principles, ranked by leverage

The seven invariants (non‑negotiable)

Architecture at a glance

The receipts

What's inside

About

Uh oh!

Releases 4

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

The tiny, trustworthy coding agent.

Why another coding agent?

The core loop

What you get

Quickstart

Other commands

Capability plug-ins

Model selection

Our dogma — first principles, ranked by leverage

The seven invariants (non‑negotiable)

Architecture at a glance

The receipts

What's inside

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages