Skip to content

Deep subsystem audit: remediate all 89 shipped-gaps (14 areas) + selfeval wiring#92

Merged
RNT56 merged 4 commits into
mainfrom
chore/audit-remediation
Jul 1, 2026
Merged

Deep subsystem audit: remediate all 89 shipped-gaps (14 areas) + selfeval wiring#92
RNT56 merged 4 commits into
mainfrom
chore/audit-remediation

Conversation

@RNT56

@RNT56 RNT56 commented Jun 30, 2026

Copy link
Copy Markdown
Owner

What this is

A deep, grounded, adversarially-verified audit of all 14 subsystem areas (81 internal packages), followed by remediation of every confirmed shipped-gap. The audit ran 14 area-auditors over merged main, each finding re-checked by an independent skeptic-verifier against the cited source: 107 confirmed (89 shipped + 18 deferred-roadmap), 4 false-positives rejected, 0 invariant breaks.

The 89 shipped-gaps were fixed in 9 file-disjoint batches (isolated worktrees) plus one integration pass for the cross-cutting wiring, then rebased onto #91. Every fix carries a focused test.

Highlights

  • HIGH bug fixed: OpenAI reasoning models (gpt-5.x/o-series) always emitted max_tokens and were 400-rejected — the field-switch had no caller. Now auto-detects reasoning ids → max_completion_tokens.
  • Duplicate wake-delivery (gate-bypass risk): under NILCORE_AUTONOMY, the server's own waker AND the autonomy daemon polled the same registry. Now the gated daemon owns wakes (server.SuppressWaker) and fireDueWakes claims at-most-once (wake.Registry.Claim).
  • eventlog torn-line: a single torn write permanently broke Verify (disabling all earned-trust auto-approval). The heal now truncates the partial tail.
  • SQLite-pool self-deadlock (found during integration): the new memory-UNIQUE migration issued a nested PRAGMA while its cursor was open → hung every store-opening package. Fixed (drain cursor, then query).
  • Security/invariant hardening: egress proxy binds 127.0.0.1 (not 0.0.0.0) unless a bridged container needs it; secret stores fail closed; redaction catches inline JSON credentials; registry.InstallSkill blocks path-traversal; macOS host-mode records its screenshot-exclusion + CGEvent-tagging limitations.
  • Wiring completed: chat+serve supervise/project drives stream + accept steer (Inbox/Out); serve reaches native web-search parity; ast.References edges + live-index deletion path; provider Tuning; selfeval trust-fold end-to-end (flywheel emits a verifier-judged selfeval_report; trust.Replay folds it into the per-config evidence view — never routing).
  • Swarm: mergeOrder is now DependsOn-topological (lexical sort broke at ≥10 shards).

See the CHANGELOG.md entry for the full per-area breakdown.

Conflict resolution with #91 (mcp upgrade)

#91 rewrote internal/mcp/* and overlapped my B8 batch. Resolved by keeping #91's transport architecture and re-applying only the B8 fixes #91 didn't subsume: per-tool descriptor pruning (pruneStaleWrappers merged into #91's GenerateWrappers) and the PruneServers reconcile wired into #91's setupMCP. B8's decodeCtx and isError test were dropped as superseded by #91's ctx-aware transport and TestCallToolFailureNotRetried.

Honest scope boundary

7 of 89 are resolved as truthful documentation, not code — each a genuine larger feature or a hard problem where half-building would be worse (the technical reason is in the CHANGELOG): the Telegram/Slack inline steer button (Slack interactivity infra), measure.Fence cost surfacing (meaningless estimate-delta; the real $ wall is blastbudget), the graph node-id collision (needs AST type inference), namespace allowlisted egress (operator is warned, not silently switched), serve crash-resume re-delivery, and serve HOL-block.

Verification

  • make verify119 packages, exit 0 (re-run on the rebased tree)
  • make tui-verify (-tags tui) — green
  • golangci-lint — clean
  • go test -race — clean on mcp/server/wake/store/super/trust

🤖 Generated with Claude Code

RNT56 and others added 4 commits June 30, 2026 13:30
…ck fix

Applies the 9 verified per-subsystem fix batches from the deep audit and
repairs a single-connection SQLite pool self-deadlock in the new memory
UNIQUE migration (hasMemoryUnique ran a nested PRAGMA while its cursor was
open). make verify green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Wires the batch agents' deferred cross-cutting fixes (the ones touching the
shared main.go/chat.go/server.go): provider Tuning, web-search serve parity,
NILCORE_EXPERIENCE truthiness, wake single-fire dedup (SuppressWaker+Claim),
egress-proxy bind-by-backend, supervise/project Inbox+Out, codeintel string,
host-mode notes wiring. Documents the genuine follow-ons (steer button,
selfeval trust-fold, measure.Fence cost). make verify + tui-verify +
golangci-lint + -race green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…Replay folds)

The selfeval trust-fold was built+tested but had no consumer: the flywheel
never emitted selfeval_report events, and neither trust.Replay nor the
experience projection folded them. Now newFlywheelLoop emits a verifier-judged,
chain-gated selfeval_report per baseline run (nil ledger — the durable record
is the event), and trust.Replay folds it into the per-config EVIDENCE view
(Snapshot().Configs, shown by 'nilcore trust') — deliberately NOT the backend
routing standings, which only race_outcome feeds, so a self-eval pass-rate
informs the operator without ever steering backend choice. FoldEvalReport
folds into the separate l.configs map, so this cannot perturb routing.

Wire literal 'selfeval_report' used in trust.Replay (importing the selfeval
package would cycle: selfeval imports trust). Test added asserting the fold
lands in Configs with the right pass-rate/cases and creates no routing standing.

make verify (119 pkgs) + -race green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
(1) egress (MEDIUM): proxyBindAddr bound 127.0.0.1 for an explicit '-sandbox
    namespace' request, but selectSandbox DEGRADES an unsatisfiable namespace
    request to a *sandbox.Container (e.g. on macOS / a Linux without userns),
    which needs the proxy on 0.0.0.0 across the bridge — a silent egress
    failure. Now binds by namespace availability, mirroring the degradation.

(2) selfeval (HIGH): a forged selfeval_report with a negative/huge 'cases'
    count panicked trust.Replay at make([]Result, int(casesF)) BEFORE the chain
    Verify could reject the tampered log — crashing the routing hot path. The
    count is now bounds-checked ([0, maxSelfevalCases], NaN-safe) before alloc.

Both found by the 6-skeptic risk review of PR #92; both have regression tests.
make verify (119 pkgs) green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@RNT56 RNT56 merged commit 66690a5 into main Jul 1, 2026
6 checks passed
RNT56 added a commit that referenced this pull request Jul 1, 2026
#93)

Full grounded review (17 reviewers × adversarial verify) of main post #91/#92:
7 invariants intact; 29 issues surfaced; 26 distinct fixes, each with a test.

HIGH (both in the merged #91 mcp code):
- stdio transport could deadlock every caller under an uninterruptible blocking
  Decode holding the per-server mutex — now ctx-cancellable + evict-on-cancel,
  plus a bounded Discover boot timeout.
- untrusted MCP tool name written to a host path unsanitized (traversal) —
  now slug()-sanitized like resources/prompts.

Also: live token streaming restored (Resilient forwards live on attempt 0) +
partial-on-cancel; experience Fold moved off the eventlog append mutex (single
ordered drainer); kill-switch WithRoot wired; browser Enter-submit gated;
desktop stale-ref versioned; secret reflow scrubbed for all field types;
flywheel RotatedLogPaths wired; project SWITCH threads *State; code.test_passes
runs ./...; LSP Content-Length capped; keychain fail-closed; +more.

make verify (119 pkgs) + tui-verify + golangci-lint + -race all green.

Co-authored-by: RNT56 <ridfox44@gmail.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant