Deep subsystem audit: remediate all 89 shipped-gaps (14 areas) + selfeval wiring#92
Merged
Conversation
…ck fix Applies the 9 verified per-subsystem fix batches from the deep audit and repairs a single-connection SQLite pool self-deadlock in the new memory UNIQUE migration (hasMemoryUnique ran a nested PRAGMA while its cursor was open). make verify green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Wires the batch agents' deferred cross-cutting fixes (the ones touching the shared main.go/chat.go/server.go): provider Tuning, web-search serve parity, NILCORE_EXPERIENCE truthiness, wake single-fire dedup (SuppressWaker+Claim), egress-proxy bind-by-backend, supervise/project Inbox+Out, codeintel string, host-mode notes wiring. Documents the genuine follow-ons (steer button, selfeval trust-fold, measure.Fence cost). make verify + tui-verify + golangci-lint + -race green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…Replay folds) The selfeval trust-fold was built+tested but had no consumer: the flywheel never emitted selfeval_report events, and neither trust.Replay nor the experience projection folded them. Now newFlywheelLoop emits a verifier-judged, chain-gated selfeval_report per baseline run (nil ledger — the durable record is the event), and trust.Replay folds it into the per-config EVIDENCE view (Snapshot().Configs, shown by 'nilcore trust') — deliberately NOT the backend routing standings, which only race_outcome feeds, so a self-eval pass-rate informs the operator without ever steering backend choice. FoldEvalReport folds into the separate l.configs map, so this cannot perturb routing. Wire literal 'selfeval_report' used in trust.Replay (importing the selfeval package would cycle: selfeval imports trust). Test added asserting the fold lands in Configs with the right pass-rate/cases and creates no routing standing. make verify (119 pkgs) + -race green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
(1) egress (MEDIUM): proxyBindAddr bound 127.0.0.1 for an explicit '-sandbox
namespace' request, but selectSandbox DEGRADES an unsatisfiable namespace
request to a *sandbox.Container (e.g. on macOS / a Linux without userns),
which needs the proxy on 0.0.0.0 across the bridge — a silent egress
failure. Now binds by namespace availability, mirroring the degradation.
(2) selfeval (HIGH): a forged selfeval_report with a negative/huge 'cases'
count panicked trust.Replay at make([]Result, int(casesF)) BEFORE the chain
Verify could reject the tampered log — crashing the routing hot path. The
count is now bounds-checked ([0, maxSelfevalCases], NaN-safe) before alloc.
Both found by the 6-skeptic risk review of PR #92; both have regression tests.
make verify (119 pkgs) green.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
RNT56
added a commit
that referenced
this pull request
Jul 1, 2026
#93) Full grounded review (17 reviewers × adversarial verify) of main post #91/#92: 7 invariants intact; 29 issues surfaced; 26 distinct fixes, each with a test. HIGH (both in the merged #91 mcp code): - stdio transport could deadlock every caller under an uninterruptible blocking Decode holding the per-server mutex — now ctx-cancellable + evict-on-cancel, plus a bounded Discover boot timeout. - untrusted MCP tool name written to a host path unsanitized (traversal) — now slug()-sanitized like resources/prompts. Also: live token streaming restored (Resilient forwards live on attempt 0) + partial-on-cancel; experience Fold moved off the eventlog append mutex (single ordered drainer); kill-switch WithRoot wired; browser Enter-submit gated; desktop stale-ref versioned; secret reflow scrubbed for all field types; flywheel RotatedLogPaths wired; project SWITCH threads *State; code.test_passes runs ./...; LSP Content-Length capped; keychain fail-closed; +more. make verify (119 pkgs) + tui-verify + golangci-lint + -race all green. Co-authored-by: RNT56 <ridfox44@gmail.com> Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this is
A deep, grounded, adversarially-verified audit of all 14 subsystem areas (81 internal packages), followed by remediation of every confirmed shipped-gap. The audit ran 14 area-auditors over merged
main, each finding re-checked by an independent skeptic-verifier against the cited source: 107 confirmed (89 shipped + 18 deferred-roadmap), 4 false-positives rejected, 0 invariant breaks.The 89 shipped-gaps were fixed in 9 file-disjoint batches (isolated worktrees) plus one integration pass for the cross-cutting wiring, then rebased onto #91. Every fix carries a focused test.
Highlights
max_tokensand were 400-rejected — the field-switch had no caller. Now auto-detects reasoning ids →max_completion_tokens.NILCORE_AUTONOMY, the server's own waker AND the autonomy daemon polled the same registry. Now the gated daemon owns wakes (server.SuppressWaker) andfireDueWakesclaims at-most-once (wake.Registry.Claim).Verify(disabling all earned-trust auto-approval). The heal now truncates the partial tail.127.0.0.1(not0.0.0.0) unless a bridged container needs it; secret stores fail closed; redaction catches inline JSON credentials;registry.InstallSkillblocks path-traversal; macOS host-mode records its screenshot-exclusion + CGEvent-tagging limitations.ast.Referencesedges + live-index deletion path; providerTuning; selfeval trust-fold end-to-end (flywheel emits a verifier-judgedselfeval_report;trust.Replayfolds it into the per-config evidence view — never routing).mergeOrderis now DependsOn-topological (lexical sort broke at ≥10 shards).See the
CHANGELOG.mdentry for the full per-area breakdown.Conflict resolution with #91 (mcp upgrade)
#91 rewrote
internal/mcp/*and overlapped my B8 batch. Resolved by keeping #91's transport architecture and re-applying only the B8 fixes #91 didn't subsume: per-tool descriptor pruning (pruneStaleWrappersmerged into #91'sGenerateWrappers) and thePruneServersreconcile wired into #91'ssetupMCP. B8'sdecodeCtxand isError test were dropped as superseded by #91's ctx-aware transport andTestCallToolFailureNotRetried.Honest scope boundary
7 of 89 are resolved as truthful documentation, not code — each a genuine larger feature or a hard problem where half-building would be worse (the technical reason is in the CHANGELOG): the Telegram/Slack inline steer button (Slack interactivity infra),
measure.Fencecost surfacing (meaningless estimate-delta; the real $ wall is blastbudget), the graph node-id collision (needs AST type inference), namespace allowlisted egress (operator is warned, not silently switched), serve crash-resume re-delivery, and serve HOL-block.Verification
make verify— 119 packages, exit 0 (re-run on the rebased tree)make tui-verify(-tags tui) — greengolangci-lint— cleango test -race— clean on mcp/server/wake/store/super/trust🤖 Generated with Claude Code