Deep subsystem audit: remediate all 89 shipped-gaps (14 areas) + selfeval wiring by RNT56 · Pull Request #92 · RNT56/NilCore

RNT56 · 2026-06-30T11:36:50Z

What this is

A deep, grounded, adversarially-verified audit of all 14 subsystem areas (81 internal packages), followed by remediation of every confirmed shipped-gap. The audit ran 14 area-auditors over merged main, each finding re-checked by an independent skeptic-verifier against the cited source: 107 confirmed (89 shipped + 18 deferred-roadmap), 4 false-positives rejected, 0 invariant breaks.

The 89 shipped-gaps were fixed in 9 file-disjoint batches (isolated worktrees) plus one integration pass for the cross-cutting wiring, then rebased onto #91. Every fix carries a focused test.

Highlights

HIGH bug fixed: OpenAI reasoning models (gpt-5.x/o-series) always emitted max_tokens and were 400-rejected — the field-switch had no caller. Now auto-detects reasoning ids → max_completion_tokens.
Duplicate wake-delivery (gate-bypass risk): under NILCORE_AUTONOMY, the server's own waker AND the autonomy daemon polled the same registry. Now the gated daemon owns wakes (server.SuppressWaker) and fireDueWakes claims at-most-once (wake.Registry.Claim).
eventlog torn-line: a single torn write permanently broke Verify (disabling all earned-trust auto-approval). The heal now truncates the partial tail.
SQLite-pool self-deadlock (found during integration): the new memory-UNIQUE migration issued a nested PRAGMA while its cursor was open → hung every store-opening package. Fixed (drain cursor, then query).
Security/invariant hardening: egress proxy binds 127.0.0.1 (not 0.0.0.0) unless a bridged container needs it; secret stores fail closed; redaction catches inline JSON credentials; registry.InstallSkill blocks path-traversal; macOS host-mode records its screenshot-exclusion + CGEvent-tagging limitations.
Wiring completed: chat+serve supervise/project drives stream + accept steer (Inbox/Out); serve reaches native web-search parity; ast.References edges + live-index deletion path; provider Tuning; selfeval trust-fold end-to-end (flywheel emits a verifier-judged selfeval_report; trust.Replay folds it into the per-config evidence view — never routing).
Swarm: mergeOrder is now DependsOn-topological (lexical sort broke at ≥10 shards).

See the CHANGELOG.md entry for the full per-area breakdown.

Conflict resolution with #91 (mcp upgrade)

#91 rewrote internal/mcp/* and overlapped my B8 batch. Resolved by keeping #91's transport architecture and re-applying only the B8 fixes #91 didn't subsume: per-tool descriptor pruning (pruneStaleWrappers merged into #91's GenerateWrappers) and the PruneServers reconcile wired into #91's setupMCP. B8's decodeCtx and isError test were dropped as superseded by #91's ctx-aware transport and TestCallToolFailureNotRetried.

Honest scope boundary

7 of 89 are resolved as truthful documentation, not code — each a genuine larger feature or a hard problem where half-building would be worse (the technical reason is in the CHANGELOG): the Telegram/Slack inline steer button (Slack interactivity infra), measure.Fence cost surfacing (meaningless estimate-delta; the real $ wall is blastbudget), the graph node-id collision (needs AST type inference), namespace allowlisted egress (operator is warned, not silently switched), serve crash-resume re-delivery, and serve HOL-block.

Verification

make verify — 119 packages, exit 0 (re-run on the rebased tree)
make tui-verify (-tags tui) — green
golangci-lint — clean
go test -race — clean on mcp/server/wake/store/super/trust

🤖 Generated with Claude Code

…ck fix Applies the 9 verified per-subsystem fix batches from the deep audit and repairs a single-connection SQLite pool self-deadlock in the new memory UNIQUE migration (hasMemoryUnique ran a nested PRAGMA while its cursor was open). make verify green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Wires the batch agents' deferred cross-cutting fixes (the ones touching the shared main.go/chat.go/server.go): provider Tuning, web-search serve parity, NILCORE_EXPERIENCE truthiness, wake single-fire dedup (SuppressWaker+Claim), egress-proxy bind-by-backend, supervise/project Inbox+Out, codeintel string, host-mode notes wiring. Documents the genuine follow-ons (steer button, selfeval trust-fold, measure.Fence cost). make verify + tui-verify + golangci-lint + -race green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…Replay folds) The selfeval trust-fold was built+tested but had no consumer: the flywheel never emitted selfeval_report events, and neither trust.Replay nor the experience projection folded them. Now newFlywheelLoop emits a verifier-judged, chain-gated selfeval_report per baseline run (nil ledger — the durable record is the event), and trust.Replay folds it into the per-config EVIDENCE view (Snapshot().Configs, shown by 'nilcore trust') — deliberately NOT the backend routing standings, which only race_outcome feeds, so a self-eval pass-rate informs the operator without ever steering backend choice. FoldEvalReport folds into the separate l.configs map, so this cannot perturb routing. Wire literal 'selfeval_report' used in trust.Replay (importing the selfeval package would cycle: selfeval imports trust). Test added asserting the fold lands in Configs with the right pass-rate/cases and creates no routing standing. make verify (119 pkgs) + -race green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

(1) egress (MEDIUM): proxyBindAddr bound 127.0.0.1 for an explicit '-sandbox namespace' request, but selectSandbox DEGRADES an unsatisfiable namespace request to a *sandbox.Container (e.g. on macOS / a Linux without userns), which needs the proxy on 0.0.0.0 across the bridge — a silent egress failure. Now binds by namespace availability, mirroring the degradation. (2) selfeval (HIGH): a forged selfeval_report with a negative/huge 'cases' count panicked trust.Replay at make([]Result, int(casesF)) BEFORE the chain Verify could reject the tampered log — crashing the routing hot path. The count is now bounds-checked ([0, maxSelfevalCases], NaN-safe) before alloc. Both found by the 6-skeptic risk review of PR #92; both have regression tests. make verify (119 pkgs) green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

#93) Full grounded review (17 reviewers × adversarial verify) of main post #91/#92: 7 invariants intact; 29 issues surfaced; 26 distinct fixes, each with a test. HIGH (both in the merged #91 mcp code): - stdio transport could deadlock every caller under an uninterruptible blocking Decode holding the per-server mutex — now ctx-cancellable + evict-on-cancel, plus a bounded Discover boot timeout. - untrusted MCP tool name written to a host path unsanitized (traversal) — now slug()-sanitized like resources/prompts. Also: live token streaming restored (Resilient forwards live on attempt 0) + partial-on-cancel; experience Fold moved off the eventlog append mutex (single ordered drainer); kill-switch WithRoot wired; browser Enter-submit gated; desktop stale-ref versioned; secret reflow scrubbed for all field types; flywheel RotatedLogPaths wired; project SWITCH threads *State; code.test_passes runs ./...; LSP Content-Length capped; keychain fail-closed; +more. make verify (119 pkgs) + tui-verify + golangci-lint + -race all green. Co-authored-by: RNT56 <ridfox44@gmail.com> Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

RNT56 and others added 4 commits June 30, 2026 13:30

RNT56 merged commit 66690a5 into main Jul 1, 2026
6 checks passed

RNT56 mentioned this pull request Jul 1, 2026

Repo-wide arch/robustness/completeness pass: 26 verified fixes (2 HIGH mcp bugs) #93

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deep subsystem audit: remediate all 89 shipped-gaps (14 areas) + selfeval wiring#92

Deep subsystem audit: remediate all 89 shipped-gaps (14 areas) + selfeval wiring#92
RNT56 merged 4 commits into
mainfrom
chore/audit-remediation

RNT56 commented Jun 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

RNT56 commented Jun 30, 2026

What this is

Highlights

Conflict resolution with #91 (mcp upgrade)

Honest scope boundary

Verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant