The harness is small. The model is the engine.
NilCore borrows intelligence instead of re‑encoding it — so the whole agent is ~75,000 lines of Go (a ~8k single‑task core you can read in an afternoon; everything else is opt‑in layers over it): the single‑task loop, an opt‑in multi‑agent supervisor that builds whole projects, a verified swarm that fans hundreds of agents at a problem, and a recursive decompose that splits a goal and merges the verified pieces back — all collapsed onto one orchestration kernel, so you don't pick a machine: you just talk, and nilcore routes the goal to the cheapest one that fits. It treats code as one verifiable artifact among many — reports, comparison matrices, audits, benchmarks, research dossiers — each carrying claims a verifier re‑checks in the sandbox. It can see the running app through a sandboxed browser — even driving a flow (log in, submit a form) before it observes — search code semantically, read 19 languages (Go · Python · TS/JS · Rust · Java · C/C++ · C# · Ruby · Kotlin · Swift · …), and start work from a webhook or a schedule. And it closes the loop on its own evidence — learning from its verified-or-failed trace which backend to trust, what to recheck, and (opt‑in, fenced, never on main) what it may auto‑approve. Hardened by three disciplines and seven invariants it never breaks.
TL;DR — Point NilCore at a repo and a goal. It works in a throwaway git worktree, runs every command the model emits inside a sandbox, and isn't done until your checks pass — not until the model says it's done. Drive it from your terminal or your phone. It never holds your keys, never lets the model run an arbitrary program on the host, and never decides "done" on its own word.
nilcore # just start talking — it picks the machine and works while you type
nilcore -goal "make the failing test in math_test.go pass" # or drive one task headlessBecause most of them ask you to trust a black box. NilCore is built on the opposite bet: trust comes from verification, sandboxing, and a trace you can read — not from a bigger model. Here's the pain, and how NilCore kills it:
| The pain you've felt | How NilCore solves it |
|---|---|
| "It said it was done. It wasn't." | The verifier is the only authority on done. After any backend runs, your project's own build/test/lint re‑runs and that verdict ships the work — a self‑report never does. |
| "Tests pass, but does the app actually work?" | NilCore can see the running app. A sandboxed headless browser (browser_view + a pure‑Go nilcore-browser driver baked into the image) navigates your app — and, given an actions script, first drives a flow (click / type / key / wait, e.g. log in or submit a form) over a pure‑Go CDP client — then hands the model a screenshot as a multimodal image; opt‑in via NILCORE_BROWSER_VERIFY, a composite verifier folds that behavioral check into the verdict, so the verifier stays the sole authority on done. (The live browser run is CI‑only — no Chromium in hermetic unit tests — and the driver fails closed without a browser.) |
| "It ran a destructive command / touched my host." | Every command the model emits runs in a container (rootless, cap-drop=ALL, read‑only rootfs), destructive ones denylisted before execution. The model can't run an arbitrary program on your machine; its file edits are confined to a throwaway worktree. |
| "It leaked my API key." | Secrets come from the environment only, are injected per‑run into the container, and are never written to disk, put in a prompt, or logged — the audit log is hash‑chained and redacted. |
| "A fetched file/web page hijacked it." | Untrusted input is data, never instructions. Tool output, files, and web content are fenced behind a boundary the model is told not to obey. |
| "It edited blindly without understanding my codebase." | A real code‑intelligence stack — AST → call graph → PageRank repo‑map → semantic + LSP retrieval — hands the loop a minimal, structurally‑coherent context bundle before it touches a file. It reads 19 languages across 34 file extensions — Go · Python · TS/JS · Rust · Java · C · C++ · C# · Ruby · PHP · Kotlin · Swift · Scala · Dart · Zig · Bash · Lua · Elixir · SQL (a pure‑Go parser seam — Go is precise via go/parser, the rest are broad, structural heuristic line scanners, no tree‑sitter; the NILCORE_LSP_COMMAND seam stays the precise lens where a server exists), and semantic search runs on a content‑hash‑cached, pure‑Go HNSW vector index — opt‑in via NILCORE_EMBED_KEY, with a lexical fallback that's byte‑identical when it's off. |
| "It can only fix one task, not build the thing." | nilcore build is a supervisor that spawns role‑specialized subagents (research · understand · plan · implement · review), lets them talk back and forth, integrates their parallel worktrees into one verifier‑green tree, and re‑plans to convergence — greenfield included. It still writes code itself. |
| "I have to babysit it / can't course‑correct mid‑run." | Just talk to it. nilcore chat is one conversation — a model classifier (not a word‑count heuristic) sizes your message by the work and picks the cheapest honest route: a single native loop, the supervised fan‑out, or a whole project. While it works its reasoning streams live, token by token, and you can queue a follow‑up (folds in at the next step), steer — !… interrupts mid‑thought but keeps what it's reasoned so far, folds your feedback in, and resumes or changes course — or /cancel to abort the run outright while staying in the conversation. |
| "It went rogue while I was away." | Bounded autonomy: reversible work runs unattended; irreversible actions (merge, push, deploy, pay) hit a human gate — which becomes a Yes/No tap in Telegram or Slack. |
| "I want it to react — not just sit there waiting for me." | Event‑driven & scheduled autonomy. nilcore serve --webhook turns an HMAC‑verified SCM/CI webhook into a trigger; nilcore schedule self‑starts on a cron/interval. Both route through the same reversible‑auto‑start / human‑gate machinery — headless means irreversible work deny‑defaults. |
| "Opening the PR is the part I don't trust it with." | Gated PR. nilcore watch --open-pr / schedule --open-pr open a draft PR (via internal/forge) only after the human gate — the push runs inside the approved prepare step, the token comes from the SecretStore and is scrubbed from logs, and the agent never merges. The verified branch is preserved; default disposable cleanup is byte‑identical. |
| "I can't give it project‑specific marching orders." | Operator steering. Drop a NILCORE.md / AGENTS.md and it loads as trusted instructions — the one deliberate, scoped exception to "untrusted input is data," bounded below the safety core: it can shape behavior but can't widen capability or bypass the gate or verifier. Wired into chat and run/build. |
| "I'm locked into one model vendor." | One Provider seam, three adapters: Anthropic, OpenAI, OpenRouter. Model selection is role → provider:model. The cheap executor escalates to a strong advisor on demand. And one CodingBackend seam, three backends: the native loop, Codex, Claude Code — and you don't have to pick: -backend auto lets the system choose the best available backend (the ones whose CLI + key are actually present on the host), seeded by your stated preference (-prefer-backend / preferred_backend) and re‑ordered as the verifier‑judged Trust Ledger learns which one wins on your codebase. Or -backends auto competes all available backends — racing them on a hard task and letting the verifier pick the winner (nilcore trust shows the scoreboard). No more hard‑defaulting to native as if it were best. |
| "It forgets everything between tasks." | Cross‑project memory (SQLite): conventions and decisions are retrieved into context at task start and written back after — deduped, never as instructions. |
| "The framework is too big to trust." | The entire agent is ~75,000 lines of Go with two core dependencies — pure‑Go SQLite, and golang.org/x/sys (Go's own extended stdlib) for the Linux namespace sandbox — built up from a ~8k single‑task core you can read in an afternoon, with the multi‑agent layer, swarm, browser/desktop, code‑intel, closed‑loop autonomy, and the conversational front door as opt‑in layers over it (one orchestration kernel they all collapse onto). Still exactly two: the browser driver (incl. its pure‑Go CDP/WebSocket client), the multi‑language parser backends, embedder, and forge are all pure stdlib — no module was added. If you can't read it end to end, it's too big. (The optional full‑screen TUI — make tui — links the Charm stack under a build tag, so the default binary doesn't and internal/ never imports it.) |
Everything orbits one loop. The verifier — your checks — is the source of truth.
flowchart LR
G([Goal]) --> CTX[gather context<br/>+ memory + code-intel]
CTX --> M[model picks a tool]
M --> S[execute in sandbox]
S --> O[observe]
O --> V{VERIFY<br/>build · test · lint}
V -- red --> M
V -- green --> GATE{irreversible?}
GATE -- reversible --> SHIP([shipped])
GATE -- merge / push / deploy --> HUMAN[human gate<br/>console / chat]
HUMAN --> SHIP
Whatever writes the diff — NilCore's own loop, Codex, or Claude Code — your checks decide whether it ships. That single rule is what makes delegating to black‑box agents safe.
|
▸ Hybrid backends, one contract Native loop + delegate to Codex / Claude Code. Add one without touching the core. ▸ Hardened sandbox Rootless containers, dropped caps, read‑only rootfs, default‑deny egress with an allowlist proxy. ▸ Secrets that never leak Keychain / encrypted‑file vault / env / external hook. The model never sees a key. ▸ Drive it from your phone
▸ One conversational front door ( ▸ One engine — you don't pick a machine ( ▸ Verifier‑backed artifacts, not just code
Code is one artifact type among many — reports, comparison matrices, audits, benchmarks, research dossiers — each a typed |
▸ Code intelligence (19 languages — Go · Python · TS/JS · Rust · Java · C/C++ · C# · Ruby · Kotlin · Swift · …; heuristic scanners, LSP = the precise lens) AST · call graph · PageRank repo‑map · LSP · pure‑Go HNSW semantic search · Impact Set + SBFL · live worktree‑aware updates. ▸ It can see the running app
A sandboxed headless browser ( ▸ Multi‑agent supervisor ( ▸ Tamper‑evident audit Append‑only, hash‑chained, secret‑redacted event log. Replay any run. ▸ Runs unattended — and reacts
Provider retry/failover, cost ceilings, durable resume on restart, resource GC, health checks. Plus event/scheduled triggers — ▸ Closed‑loop autonomy (it learns from its own evidence)
NilCore consumes its verifier‑judged trace to get better: a Trust Ledger routes to the backend that actually wins on your code, distilled lessons + a content‑hash verify‑cache stop it repeating scars, a human‑gated flywheel proposes its own improvements, and graduated auto‑approval earns wider unattended scope — fenced by a four‑axis blast‑budget and never on ▸ Verified swarm mode ( ▸ Operator steering
A |
Requires Go 1.25+. On Linux with a Landlock‑capable kernel (5.13+) and unprivileged user namespaces, NilCore sandboxes the loop with no container runtime at all — the auto‑detected host‑native namespace backend. Otherwise (or with -sandbox container) it uses a container runtime (podman rootless preferred, or docker).
# Install (or grab a binary from Releases)
curl -fsSL https://raw.githubusercontent.com/RNT56/NilCore/main/scripts/install.sh | sh
# 1) Guided setup — one pass: providers + keys (→ SecretStore), runtime, backend,
# chat channel + serve allowlist. Re-check readiness anytime with `nilcore doctor`.
nilcore init
# 2) Just talk to it — the conversational front door. It infers whether your message
# is a quick fix, a feature, or a whole project and pulls the strings itself; it
# works while you type, so you can QUEUE a follow-up or STEER (!...) to interrupt
# its current step. This is the usual way to drive NilCore.
nilcore # same as: nilcore chat -dir .
# One-shot, but let the agent pick HOW to work: `do` routes the goal to the cheapest
# preset that fits — run (a task), build (a project), swarm (breadth), or decompose
# (split + merge) — then dispatches to that proven machine. -dry-run previews the route.
nilcore do -goal "add a login form and wire the logout button" # try -dry-run first
# — or drive a specific mode directly (also what the conversation / `do` routes to) —
# Run one task to completion (the native loop, in a disposable worktree).
# Add -auto-supervise to let the model classifier scale a complex goal UP to the
# supervised project loop (same caps as `nilcore build`); off => single-task.
nilcore -dir ./repo \
-goal "fix the failing test in math_test.go" \
-verify "go build ./... && go test ./..."
# Build a WHOLE project from one prompt — a supervisor spawns role-specialized
# subagents that talk to each other, integrates their parallel work into one
# verifier-green tree, and re-plans to convergence. Greenfield (-new) or -dir.
nilcore build -goal "Go HTTP service: /health 200 + /orders POST persists to SQLite" -new ./svc
# Delegate a single task to Claude Code or Codex — verified the same way.
# Model / effort / extra args / env are configurable (via `nilcore init`, or
# NILCORE_CLAUDE_MODEL/_EFFORT · NILCORE_CODEX_MODEL/_EFFORT); unset => CLI default.
# Or let the system pick: -backend auto chooses the best AVAILABLE backend
# (seeded by -prefer-backend, learned by the Trust Ledger); -backends auto races them all.
nilcore -dir ./repo -goal "..." -backend claude-code
# Drive it from your phone: serve gives Telegram/Slack the same conversation —
# queue + steer + auto-routing; gates become inline Yes/No replies.
nilcore serve -channel telegram # needs a channel + allowlist (from `nilcore init`)
# React to events instead of waiting: turn an HMAC-verified SCM/CI webhook into a
# trigger, or self-start on a cron/interval. Both route through the same
# reversible-auto-start / human-gate machinery (headless => irreversible work deny-defaults).
nilcore serve --webhook :8080 # needs NILCORE_WEBHOOK_SECRET (HMAC); NILCORE_WEBHOOK_LABEL optional
nilcore schedule --every 1h --goal "..." # or a cron expr; add --open-pr to open a GATED draft PR
# Let it see the running app: an opt-in composite verifier folds a sandboxed
# headless-browser behavioral check into the verdict (CI-only live run; fails closed).
NILCORE_BROWSER_VERIFY=1 nilcore -dir ./svc -goal "..."
# Fan out a VERIFIED swarm: N shards in a bounded in-process pool, each producing a
# TYPED artifact judged by a verify-pack. Only verifier-green shards ship; failed
# shards requeue until clean (or the budget/pass limit). 300 agents are fine because
# every unit is checkable — no majority vote, no "the model says it looks right".
nilcore swarm -goal "research 100 EV companies" -preset research \
-agents 300 -concurrency 40 -artifact report+matrix -verify-pack finance \
-passes until-clean -budget 500
# Presets: research | code | audit | benchmark | ui. The live scoreboard shows
# checked/passed/failed/retry-pass/remaining + cost/time/token + the source–claim
# trace; replay it anytime with `nilcore report -format matrix -dir ./repo`.
# In-process / single-host / bounded; default-off (the binary is byte-identical unused).
# Prefer env vars / CI? Skip the wizard and export keys directly:
# export ANTHROPIC_API_KEY=sk-... (or NILCORE_* for scripted: nilcore init -non-interactive)
# NILCORE_EMBED_KEY enables pure-Go HNSW semantic search; a NILCORE.md / AGENTS.md
# steering file (trusted, scoped below the safety core) gives the agent project marching orders.nilcore help lists them all. Each is one focused verb over the same audited core:
| Command | What it does |
|---|---|
nilcore do -goal … |
The agent picks how to work. Routes the goal to the cheapest preset that fits — run / build / swarm / decompose — and dispatches to that proven machine. -dry-run previews the route, -as <preset> forces one. The realization of "the conversation picks an envelope, not a machine." |
nilcore decompose -goal "<a> and <b>" |
The kernel's recursive decompose preset: split a goal into independent sub-goals, run each as a full verified task, then merge the verified branches into one re-verified tip — re-verifying after every merge and dropping any piece that conflicts or turns the tree red (the verifier owns "done", not the pieces). Opt-in. |
nilcore flows validate|run -flow f.json |
Consume a portable agentic-flows workflow. validate is a preflight gate (does NilCore support the flow's cores + capabilities?, no execution); run executes its agent_task nodes through the verified decompose preset. NilCore is the sandboxed-worker consumer of that shared contract — see docs/AGENTIC-FLOWS.md. |
nilcore doctor |
Host-readiness gate — keys resolve, runtime on PATH, serve allowlist sane. Exits non-zero when not ready, so it doubles as a CI health check. |
nilcore inspect [health] |
Replays the append-only event log into a summary (events by kind, tasks, chain verified); health probes it as a liveness gate. |
nilcore trace <task> (alias why) |
Reconstructs the causal "why did it do that" tree from the log — read-only, metadata-only; marks the trace untrusted over a broken hash chain. |
nilcore trust |
The Trust Ledger scoreboard — each backend's verifier-judged race pass-rate (plus per-model pass-rate/cost from a folded eval report). Strength is earned from evidence, never asserted. With -backends, it drives live routing: the strongest is tried first; a verify-fail races them all and the verifier picks the winner (never the ledger). |
nilcore experience · capability |
The closed-loop scoreboard — the experience projection derived over the log (what's been tried, what passed), and the exact "what may this drive do" capability descriptor. Read-only. experience -warm reads the warm store-backed projection (no full log replay); -rebuild re-derives it from the log. |
nilcore lessons |
The recurring verifier-failure patterns the agent distilled from its own trace (opt-in, auto-folded into memory) — so it stops repeating its scars. |
nilcore flywheel [--once] |
The self-improvement flywheel — eval → mine failures → propose a fix. Verified and human-gated; it never edits the verifier of record. Auto-merge is a separate double opt-in. |
nilcore objective · auto-approvals |
The operator-only standing-objectives backlog the autonomy daemon draws from · the account of past graduated auto-approvals + the per-class undo story (every auto-approval is fenced by a blast-budget and never fires on main/prod). |
nilcore watch |
Self-starts tasks from dropped signal files — reversible work auto-runs, anything irreversible routes to the human gate (--open-pr opens a gated draft PR once approved). |
nilcore schedule |
Same as watch, but self-starts on a cron/interval (same --open-pr gate). |
nilcore browse -goal … |
Drives a persistent, in-sandbox browser (observe → plan → act → verify); recorded findings are re-verified in-box before they ship. |
nilcore desktop -goal … |
Drives a contained virtual desktop via the Set-of-Marks ladder; --mac-host (doubly gated) drives a real Mac. |
nilcore registry list|install <manifest.json> |
Manages versioned local skills + MCP server specs (remote fetch stays gated as external infra). |
nilcore propose-edit -goal … -paths … |
The gated self-edit flow — the agent may change its own prompts/skills/tools, never the core or contracts (scope-checked, verified, human-gated). |
nilcore config show |
Prints the active, secret-free config. |
nilcore secret set <name> |
Stores or rotates one credential (into the SecretStore — never disk/log/prompt). |
nilcore version |
Reports the build. |
All opt-in — the default binary stays dependency-light and the loop is byte-identical when they're absent:
| Plug-in | Turn it on with | What you get |
|---|---|---|
| Skills | A SKILL.md (frontmatter + instructions) in ~/.config/nilcore/skills/ (or $NILCORE_SKILLS_DIR) |
Surfaces to the loop as a skill_<name> tool; unused skills cost ~zero context. |
| MCP servers | {name, command} (stdio) or {name, url, headers} (remote HTTP/SSE) entries in mcp.json |
nilcore generates typed wrappers under mcp/servers/; the executor discovers them on demand and invokes the host-dispatched mcp tool — so MCP works on every sandbox tier, including the macOS container default. Resources + prompts are opt-in (NILCORE_MCP_RESOURCES=1). |
| LSP retrieval | NILCORE_LSP_COMMAND=gopls (or any language server) |
Compiler-grade "precise" retrieval. |
| Live index | NILCORE_LIVE_INDEX=1 |
A worktree-aware, incrementally-updated live code-intelligence tool. |
Set NILCORE_MODEL=provider:model (default claude-sonnet-4-6):
- Bare name → Anthropic — e.g.
claude-sonnet-4-6. - Other providers —
openai:gpt-5.5,openrouter:meta-llama/llama-3.1-70b. - OpenRouter fusion —
openrouteroropenrouter:with no model defaults toopenrouter/fusion, a multi-model panel that fuses several frontier models into one answer (it bills the panel's cumulative cost).
Every step is appended to a hash-chained nilcore.events.jsonl — read it to see exactly what the agent did and why. Plaintext secrets never hit disk, logs, or prompts; on a headless host they are sealed in an encrypted-file vault (AES-256-GCM, owner-only key).
By 2026 the frontier models inside every serious agent have converged. The harness does the rest. NilCore's bet is to be the best harness — and "best" is the disciplined application of a short list, not a long list of features.
- The feedback loop is the product. Knowing — truthfully, fast — whether the code works is everything. Verification is the sole authority on done.
- The harness wins; borrow the intelligence. Keep the harness small, sharp, and yours; let the model supply the fluency.
- Context is the scarce resource — engineer it ruthlessly. The right context beats the biggest window. Retrieve precisely, prune aggressively, summarize on handoff.
- Understand before you change. Navigate symbols, references, and a repo‑map first. Earn the right to edit.
- Small, reversible, verified steps. One change → verify → checkpoint. Reversible by construction, so the gate concentrates only where reversibility ends.
- Define "done" before you start. Acceptance criteria — ideally a failing test — first. The best defense against confidently building the wrong thing.
- Quality is the bar, not correctness. Green is the floor. A minimal, idiomatic diff a senior would approve is the bar.
- Recover, don't thrash. Recognize being stuck and change strategy — escalate to the advisor, or stop and ask one sharp question.
- Earn improvement from evidence. Tune from evals and the audit trail, not vibes.
- Safety is what makes autonomy possible. The sandbox, the gate, the audit, and no ambient authority aren't friction — they're why the agent can be trusted to run unattended.
Anti‑principles we refuse: reaching for a bigger model instead of a better harness · stuffing the context window "to be safe" · heroic one‑shot rewrites · trusting "it works" over a check · editing before understanding · optimizing on vibes · bolting on features that dilute the core.
These hold in every commit. Break one and the change is rejected — no matter how good the rest is.
- One frozen backend contract —
Run(ctx, Task) (Result, error). Native, Codex, Claude Code are interchangeable behind it. - The verifier is the only authority on "done." A self‑report never governs.
- No ambient authority. Secrets via env only; never on disk, in logs, in prompts, or in code.
- Model-emitted execution is sandboxed. Shell commands and delegated CLIs run in the container; the structured file/git tools run host-side but stay confined to the worktree — the model can't run an arbitrary program on the host.
- The audit log is append‑only — hash‑chained, redacted, replayable. History is never mutated.
- Zero‑dependency core — standard library only; the sanctioned exceptions are pure‑Go SQLite,
golang.org/x/sys(Go's own extended stdlib, for the Linux namespace sandbox), and the Charm TUI stack (behind//go:build tui, so the default binary links none). The MCP client is not a module — it's JSON‑RPC over the stdlib. - Untrusted input is data, never instructions.
flowchart TD
CLI[cmd/nilcore<br/>chat · do · run · build · swarm · decompose · serve · report · schedule · doctor] --> ROUTER[router<br/>do: goal → preset]
ROUTER --> KERNEL[kernel<br/>one recursive Run · run/build/swarm/decompose presets]
CLI --> KERNEL
KERNEL --> AGENT[agent<br/>orchestrator + adaptive routing]
KERNEL --> SWARM[swarm<br/>bounded in-process pool · typed artifacts · requeue-until-clean]
CLI --> STEER[steering<br/>trusted NILCORE.md / AGENTS.md]
STEER --> AGENT
XP[experience · trust · lessons · flywheel<br/>closed loop over the verified trace] --> AGENT
SWARM --> POOL[pool<br/>strong planner/verifier · cheap workers · fallback · caps]
SWARM --> ARTIFACT[artifact + evverify + packs<br/>typed claims · verifier-produced green]
ARTIFACT --> VERIFY
AGENT --> BK[backend<br/>native · codex · claude-code]
AGENT --> WT[worktree<br/>disposable per task]
BK --> MODEL[model + provider<br/>Anthropic · OpenAI · OpenRouter · multimodal]
BK --> SANDBOX[sandbox<br/>hardened container + nilcore-browser]
BK --> VERIFY[verify<br/>source of truth + browser behavioral check]
AGENT --> POLICY[policy<br/>gate · egress · tool-call]
AGENT --> LOG[eventlog<br/>hash-chained + store]
AGENT --> CI[codeintel<br/>ast 19 languages to graph to repomap to HNSW retrieve]
AGENT --> MEM[memory<br/>cross-project SQLite]
CLI --> CHAN[channel<br/>telegram · slack]
CLI --> TRIG[scmhook · cron<br/>webhook / scheduled triggers]
TRIG --> AGENT
AGENT --> FORGE[forge<br/>gated draft PR]
Dependencies point inward; leaf packages never import the orchestrator. The full design and rationale live in docs/ARCHITECTURE.md and docs/PRINCIPLES.md. For one end-to-end map of the whole system — chat behaviour, every command, the engine, and the safety core, with a front-door index to all the in-depth docs — see docs/REFERENCE.md.
| ~75,000 | lines of Go — the agent itself (~8k single‑task core · multi‑agent supervisor · conversational front door · verified swarm · recursive decompose · closed‑loop autonomy — all on one orchestration kernel) |
| ~142,300 | lines including its tests (347 test files) |
| 122 | small, single‑responsibility packages |
| 2 | core deps in the default binary — pure‑Go SQLite · golang.org/x/sys (Go's extended stdlib); the Charm TUI's 3 modules link only under make tui. The browser driver (incl. a pure‑Go CDP/WebSocket client), the multi‑language parser backends, embedder, forge, the provider pool, the swarm runner, and the orchestration kernel + router are all pure stdlib — no module added |
| 7 / 7 | invariants held |
| Phases 0–16 | shipped — incl. the unified orchestration kernel (run/build/swarm/decompose collapse onto one recursive engine; nilcore do routes the goal), closed‑loop autonomy (trust‑routing, learned lessons + verify‑cache, a verified self‑improvement flywheel, and graduated auto‑approval fenced by a blast‑budget — opt‑in, never on main), the verifier‑backed artifact factory, and verified swarm mode, atop behavioral browser verification, semantic (HNSW) + multi‑language (19 languages / 34 extensions) code intel, event/scheduled triggers, gated draft PRs, and trusted operator steering |
cmd/nilcore/ chat · do · run · build · swarm · decompose · tui · init · serve · schedule · watch · browse · desktop · report · trust · trace · experience · capability · lessons · flywheel · objective · auto-approvals · inspect · registry · propose-edit · mcp-call · doctor · config · secret · version
cmd/tools/nilcore-browser pure-Go headless-browser driver baked into the sandbox image
internal/
model, provider canonical message format (+ multimodal image block) + Anthropic/OpenAI/OpenRouter
backend CodingBackend contract + native / codex / claude-code
sandbox hardened container executor
verify the source of truth for "done" (+ auto-detection · opt-in browser behavioral check)
eventlog append-only, hash-chained, redacted audit trail
policy reversibility gate · egress allowlist · tool-call denylist
agent orchestrator · routing · spawn (DAG) · durability · bus (inter-agent)
kernel, router unified orchestration kernel (one recursive Run; run/build/swarm/decompose presets, MaxChildren/Observer-bounded) · goal→preset router (the `nilcore do` brain)
super, project multi-agent supervisor · autonomous project loop + greenfield bootstrap
session, inbox conversational front door · queue/steer user-message seam
emit, loopctl live reasoning sink · steer-vs-shutdown cancel discriminator
roster, integrate role-specialized subagents · parallel-worktree merge + verify-each
artifact, evverify typed evidence artifacts (Claim/Evidence/Status) · verifier-produced green (the artifact factory)
artifact/{packs,schema} verify-packs: web·software·finance·ui·audit·benchmark·code + structural schema (curl-in-box, no SDK)
requeue, report field-granular requeue (only the failed claims) · verification report + matrix replay over the log
pool, swarm tiered provider pool (planner/verifier·workers·fallback·caps) · verified swarm: shard queue·runner·until-clean controller·scoreboard
worktreefs, browserwire symlink-safe worktree FS confinement (O_NOFOLLOW) · shared shell-quote + browser-observation contract
steering trusted NILCORE.md / AGENTS.md operator instructions (scoped below the safety core)
scmhook, cron HMAC-verified webhook triggers · cron/interval self-start
forge gated draft-PR opener (token from SecretStore; never merges)
meter token/dollar metering → the budget ceiling is a hard wall
worktree disposable git worktree per task
channel Channel contract · telegram · slack · authorized control
tools, mcp structured tools (+ browser_view) + MCP-as-code
embed opt-in OpenAI-compatible embedder (NILCORE_EMBED_KEY)
codeintel/* ast (19 languages / 34 exts — Go · Python · TS/JS · Rust · Java · C/C++ · C# · Ruby · …) · graph · repomap · lsp · semantic (HNSW) · retrieve · impact · live
store, memory SQLite backbone + cross-project memory
experience, capability derived experience projection over the log · the "what may this drive do" descriptor
trust, vcache, lessons the Trust Ledger (verifier-earned routing) · content-hash verify cache · learned verifier-failure lessons
graapprove, blastbudget graduated auto-approval (earned trust + operator envelope; never main/prod) · four-axis runtime blast fence
flywheel, autosrc, objective verified self-improvement flywheel (human-gated) · autonomy daemon · standing-objectives backlog
secrets keychain / encrypted vault / env / external
skills, selfimprove Agent Skills + plugins + gated self-edit
registry versioned local skills + MCP server specs (install / list)
budget, scheduler, maint, inspect runtime resilience & ops
onboard, paths `nilcore init` wizard + versioned config + per-OS dirs
eval/ measure-first eval harness
No ambient authority. One loop, fully observable. You can always read the trace and pull the plug.
Borrow intelligence — don't reimplement it.