feat: local Ollama opencode adapters + manifest binary fix + first local pipelines by nextlevelshit · Pull Request #1195 · re-cinq/wave

nextlevelshit · 2026-04-22T10:47:33Z

Summary

Three layers of fix for local-Ollama support on this host:

1. Manifest wiring (b2f6e54f, 925bd5ea): two opencode adapters backed by glm-4.7-flash and qwen3.5:27b. Fixes OpenCodeAdapter so it actually honors the binary: field (previously hardcoded "opencode" lookup, silently ignoring the patched fork needed for <think>-stripping with these models).

2. First local pipelines (5087ef7a, be09618e): three single-step pipelines (local-file-explain, local-naming-check, local-test-gen) sized to the local-model envelope. All run end-to-end on opencode-glm.

3. Architectural fixes surfaced along the way (530b150d, 667dd567, cb1619ee, 630f281a, e340ef62):

Schema-sync test mirror for the new contract
Zombie reaper for wave status — was accumulating "running" rows forever once a process died abnormally
Persona sweep: 11 step assignments in 10 pipelines that pinned navigator (read-only) but told the model to use Write
New Step.Permissions overlay for legitimate per-step capability bumps without picking a different persona
wave validate now flags any prompt mentioning a tool the persona doesn't grant (warn-or-error, configurable). Surfaced 60 more mismatches across 31 pipelines that were silently broken — fixable in follow-up PRs

Smoke results (real runs against this host)

Adapter selection — six candidates against ops-hello-world:

Adapter	Result	Notes
opencode-qwen (qwen3.5:27b, patched)	PASS 292s	Selected
opencode-glm (glm-4.7-flash, patched)	PASS 93s	Selected
qwen3-coder-30b GGUF	FAIL	Raw `<tool_call>` text
qwen3-8b-q8-0	FAIL	Same
qwen3-32b-q4-k-m	FAIL	Same
mistral:7b	FAIL	Review JSON unparseable
gemma4	FAIL	No artifact emitted

Local pipelines on opencode-glm:

Pipeline	Result	Notes
local-file-explain	✓ Markdown summary	285s, 18.6k tok
local-naming-check	✓ schema-clean JSON	340s, 25.2k tok, 6 findings (model false positives — known limit)
local-test-gen	✓ valid Go test shape	896s, 21.4k tok, body needs human cleanup

Side findings filed for follow-up (not in this PR)

50 additional pipelines flagged by the new validator — mostly Task (sub-agent dispatch) and Write/Edit mentions in non-Write personas. Sweep next.
audit-junk-code.yaml uses on_failure: skip + lenient JSON recovery. Broken artifacts pass silently. Now diagnosable via the new validator.
GLM hallucinates specs/999-<branch>/<file> subdirs from project's existing specs/ tree. Qwen does it too. Real fix is path-strict artifact extraction, not prompt eng.
Heartbeats vs reaper bandaid for zombie runs. Reaper covers worst case; heartbeats would give real-time death detection but is not urgent.

Test plan

go test ./... (all packages pass locally)
go vet ./...
nix shell nixpkgs#golangci-lint --command golangci-lint run ./... (0 issues)
go run ./cmd/wave validate (passes)
go run ./cmd/wave validate --all (60 mismatches reported; warn-or-error configurable)
go run ./cmd/wave run ops-hello-world ... --adapter opencode-glm
go run ./cmd/wave run local-naming-check internal/ontology --adapter opencode-glm --detach --verbose
go run ./cmd/wave run local-test-gen ... --adapter opencode-glm --detach --verbose
wave status reaper verified live (5 zombie runs swept on first call)

Reviewer notes

The patched-binary swap defaults to opencode-patched for the new adapters — alternative is to ship as opt-in only.
The Step.Permissions overlay is additive on AllowedTools and Deny-rules-win to preserve persona-level prohibitions.
The new validator defaults to hard error; CI today runs only wave validate (not --all), so it does not block on the 50 surfaced cases. Use --prompt-tools-warn or WAVE_PROMPT_TOOLS_WARN=1 to downgrade if a follow-up PR wants --all in CI without first fixing all 31 pipelines.

Replace stale opencode-local with two smoke-tested adapters for the local Ollama server on this host: - opencode-glm → ollama/glm-4.7-flash (stock opencode binary) - opencode-qwen → ollama/qwen3.5:27b (opencode-patched binary) Models picked from the host's working set after running ops-hello-world against six candidates; only these two passed end-to-end. qwen requires the patched binary's <think>-block stripper. Also ignore .wave/ runtime logs/retros — ephemeral, mirrors existing .agents/logs handling.

`NewOpenCodeAdapter` previously hardcoded the `opencode` binary lookup, silently ignoring the per-adapter `binary:` field declared in wave.yaml. As a result, forks like `opencode-patched` (which adds <think>-block stripping needed for GLM/qwen reliability) could be declared in the manifest but never actually executed. - Add `NewOpenCodeAdapterWithBinary(binary string)` that resolves the given name via PATH, with empty defaulting to `opencode`. - Add `ResolveAdapterWithBinary(name, binary string)` overload; the one-arg `ResolveAdapter` keeps its existing semantics. - Thread the binary through `AdapterRegistry` via a new `SetBinary` method and a `binaries` map consulted during `Resolve`. - Populate the registry from `manifest.Adapters[*].Binary` in `wave run`, so opencode-* adapters now invoke the configured fork. Also flips `opencode-glm` in the project manifest to opencode-patched now that the field is wired.

Local Ollama models on the kraftwerk host (qwen3.5:27b, glm-4.7-flash) struggle with the existing audit pipelines: prompts assume frontier-class context windows, multi-step tool reasoning, and strict JSON schemas. The new pipelines target the local-model envelope — single step, narrow scope, simple output — so they actually finish and produce usable artifacts on this host. - local-file-explain: read one source file, emit a Markdown summary. - local-naming-check: scan a Go directory for naming inconsistencies, emit a JSON finding list against a small contract schema. - local-test-gen: generate a table-driven Go test stub for a single function, no external deps. All three use the implementer persona (Read+Write+Edit+Bash+Glob+Grep) because the previously-default navigator persona is read-only and silently bottlenecks any pipeline that asks the model to Write outputs. Pipelines fail-loud (`on_failure: fail`) so broken artifacts surface immediately rather than slipping through `on_failure: skip`.

Initial run on opencode-glm wrote the summary to a hallucinated specs/999-feat-local-ollama-adapters/ subdirectory (model picked up the branch name and inferred a spec convention from the project's existing specs/ tree). Wave's recursive artifact discovery still picked it up, but the path was wrong. - Switch persona from implementer (description triggers spec-aware behaviour) to craftsman. - Tighten the prompt: explicit ./ prefix, explicit ban on subdirs and specs/. Pipeline still completes in ~5 min and produces a valid Markdown summary.

TestSchemaSync requires every schema in .agents/contracts/ to have a byte-identical copy in internal/defaults/contracts/ — the former is authoritative, the latter is what the embedded binary ships. Adding the local-naming-check pipeline in the previous commit added the authoritative copy without the embedded mirror, breaking CI.

`wave status` listed runs as "running" indefinitely once the underlying process died — killed wrappers, abandoned go-run sessions, OOM kills all left the DB record stuck. Saw it firsthand: the status table showed five runs as running, three of them more than 13 hours old, with zero tokens and no live PIDs anywhere on the host. - Add `reconcileZombies` to the running-runs path in status.go: walk each record, mark as failed if the tracked process is gone or, when there is no PID, if started_at is older than one hour. - For PID > 0 runs use `syscall.Kill(pid, 0)` and treat ESRCH as conclusive proof the process has gone. - Reaped runs are dropped from the live list and persisted to the DB with status="failed" and current_step="process gone (orphaned)" so later `--all` queries reflect what actually happened. Three table-driven tests cover: PID==0 stale → reaped, PID==0 fresh → survives, PID>>max-pid → reaped.

Ten pipelines pinned the read-only `navigator` persona on steps that asked the model to use the Write tool. Without Write permission the model emitted preamble text that was silently rescued by Wave's lenient artifact discovery, masking real bugs. Each step is reassigned to the smallest persona that satisfies its prompt, mirrored in both internal/defaults/pipelines/ and .agents/pipelines/. Navigator usage in agent_review contracts (which are genuinely read-only) is preserved. Per-step changes (navigator -> new): - audit-closed.yaml triage -> craftsman - audit-dead-code-review.yaml scan -> implementer - doc-fix.yaml scan-changes -> implementer - impl-feature.yaml explore -> implementer - impl-hotfix.yaml investigate -> implementer - impl-prototype.yaml pr-create -> craftsman - impl-prototype.yaml pr-merge -> craftsman - impl-recinq.yaml report -> craftsman - impl-refactor.yaml analyze -> implementer - ops-hello-world.yaml verify/review -> craftsman - plan-adr.yaml explore-context -> implementer

Today, persona permissions are the only source of truth for step tool grants, but several pipelines pin a read-only persona (navigator) and then expect the agent to Write — silently broken at runtime, occasionally rescued by lenient artifact discovery. Add Step.Permissions (manifest.Permissions) with additive merge semantics via a new ResolveStepPermissions(step, persona, adapterDef) resolver: - AllowedTools is unioned across step, persona, and adapter defaults (step entries appear first; duplicates collapse) - Deny is also unioned. Persona-level deny rules survive the merge, so adapter.PermissionChecker's deny-first precedence still wins — a step cannot remove a persona-level prohibition Wire the resolver into the executor's adapter run config and the two forge preflight scan sites so a step-level Write override does not get mis-flagged by the dependency scanner. Meta pipeline invocation is left alone (no Step in scope at that call site). Demo the override on audit-junk-code.yaml: keep the navigator persona but grant the scan step Write/Edit/Bash so it can emit findings without adopting a heavier persona. Tests cover empty step, additive Write, deny-wins-after-merge, adapter defaults fallback, duplicate collapse, and nil safety.

Adds a new check to `wave validate` that scans every prompt-type step for mentions of Claude tool names (Write, Edit, Bash(...), WebFetch, WebSearch, Read, Glob, Grep, Task, MultiEdit) and flags any step whose resolved persona does not grant the mentioned tool. This catches the common bug where a pipeline pins a read-only persona (e.g. navigator) but its prompt says "use the Write tool" -- the model cannot comply, produces preamble text, and Wave's lenient artifact discovery occasionally rescues it. Heuristic: - High-precision regexes per tool (Tool(...) / "Tool tool" / "Tool the/a/an/ to/back/into/out/for" / "use Tool"); bare matches for WebFetch/WebSearch/Task. - Persona's parameterised grants (e.g. Bash(git log*)) reduce to base name for comparison, so a persona granted Bash(git log*) satisfies a "Bash" mention. - Composition steps (gates, branches, loops, sub-pipelines, command/ conditional steps) skipped -- they do not run a model. - Forge templates ({{ forge.type }}-analyst) resolve via known forge list. Hard error by default; --prompt-tools-warn flag or WAVE_PROMPT_TOOLS_WARN=1 env var downgrades to warning for prompts that legitimately mention a tool in prose. Includes table-driven tests for the heuristic, the persona resolver, and end-to-end CLI wiring (hard error vs warn). Documents the false-positive risk: any prose mention of a tool name will flag (e.g. "do NOT use WebFetch here" still surfaces a finding) -- the fix is to either rephrase or use warn mode.

Drove `wave validate --all` mismatch count from 60 to 0 across 29 pipelines. The validator (e340ef6) flagged steps where the prompt mentions a tool the resolved persona does not grant — both genuine runtime bugs (navigator pinned on a Write step, validator pinned on a JSON-emitting step) and prose false positives where "task", "Task subagents", "WebSearch", or "Edit" appeared in incidental wording. Breakdown: - swap-persona (10): replaced read-only persona with the smallest write-capable one — extends the cb1619e sweep to converge/reproduce/investigate/etc steps - prose-rewrite (29): rephrased incidental tool-name mentions (e.g. "Task subagents" → "sub-agents", "task list" → "checklist", "Write the report" → "Save/Compose the report") - step-overlay (0): the validator does not yet honour Step.Permissions, so overlays cannot silence findings; deferred until validator integration Per-pipeline action: audit-closed swap-persona (audit: analyst → implementer) audit-consolidate prose-rewrite ("step vs task vs job") audit-dead-code-issue swap + prose (compose-report nav→craft) audit-dead-code-review prose-rewrite ("Task subagents") audit-dead-code prose-rewrite ("Task subagents") audit-doc swap + prose (compose-report nav→craft) audit-dx prose-rewrite ("for a given task") audit-security prose-rewrite ("write a proof-of-concept") audit-ux prose-rewrite ("most common task") bench-solve prose-rewrite (SWE-bench wording, file-search tooling) doc-fix prose-rewrite ("Task subagents") doc-onboard prose-rewrite ("write an onboarding", "Task-oriented") full-impl-cycle swap-persona (merge-findings nav→craft) impl-issue / -core prose-rewrite (shared implement/* prompts) impl-recinq swap + prose (converge validator→synthesizer) impl-smart-route swap + prose (assess nav→craftsman) impl-speckit prose-rewrite (shared speckit-flow/* prompts) ops-debug swap + prose (reproduce debugger→implementer) ops-implement-epic prose-rewrite ("task lists") ops-refresh prose-rewrite ("edit the issue") ops-rewrite prose-rewrite ("edit the same") ops-supervise prose-rewrite ("stay on task", "Edit vs Write") plan-scope prose-rewrite ("single task") plan-task prose-rewrite (full task→item rewrite throughout) wave-bugfix swap + prose (investigate nav→implementer) wave-orchestrate swap + prose (classify nav→craftsman) wave-smoke-llm-judge swap + prose (review nav→craftsman) wave-test-forge swap + prose (detect-forge, synthesize nav→craft) All edits mirrored into internal/defaults/pipelines/ and internal/defaults/prompts/ where a defaults copy exists. The audit-junk-code.yaml step-overlay demo from 630f281 is untouched. Before: 60 prompt/tool mismatches across 29 pipelines After: 0 mismatches; `go test ./...` and `golangci-lint run ./...` both clean

The prompt/tool validator (commit `e340ef62`) read tools only from the persona, ignoring the per-step `Step.Permissions` overlay added in `630f281a`. A pipeline that grants Write to a single navigator- pinned step via the overlay was still flagged as a mismatch, defeating the point of having an overlay. - Switch to `pipeline.ResolveStepPermissions(&step, persona, nil)` so the validator sees the merged AllowedTools from step + persona + adapter (adapter is nil at validate time — it isn't selected yet). - New table case proves the composition: navigator persona, prompt asks for Write, step overlay grants Write — produces no findings. - `wave validate --all` still passes (0 mismatches across the now- swept defaults).

The audit-junk-code handover used `on_failure: skip` plus the default lenient JSON recovery. A model that produced 239 bytes of preamble text instead of structured findings was silently rescued — the run logged "completed successfully" with a junk artifact. Flip to `on_failure: fail` so the contract is the gate it claims to be. This is paired with the prompt/tool validator (`e340ef62`) and the default-pipeline sweep (`df7fb6fd`): the upstream causes of model non-compliance are now caught at validate time, so flipping to fail won't cause new false-positive failures. CI hardening (`go run ./cmd/wave validate --all` step in release.yml) needs to land separately — the OAuth token used here lacks `workflow` scope. Run `gh auth refresh -s workflow` and add this step under `Run tests` in .github/workflows/release.yml: - name: Validate manifest + all pipelines run: go run ./cmd/wave validate --all

Local-Ollama runs occasionally produce files in unexpected locations: glm-4.7-flash and qwen3.5:27b both wrote summary outputs into a hallucinated `specs/999-<branch>/<file>` subdirectory because the mounted project happened to contain a `specs/` tree. The canonical declared path was also written so contracts passed, but the divergence went unnoticed and the workspaces accumulated cruft. Add a best-effort end-of-step walk that emits a single warning event listing any files the persona wrote outside its declared `output_artifacts` paths. Wave-managed subtrees (.agents/, project/, .git/, vendor/, node_modules/) and standard adapter drops (AGENTS.md, CLAUDE.md, dotfiles) are pruned to keep the warning focused on real drift. Walk errors are swallowed by design — observability must not introduce a new failure mode. Output is capped to the first 5 paths plus a `(+N more)` summary to keep logs scannable.

nextlevelshit added 13 commits April 22, 2026 11:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: local Ollama opencode adapters + manifest binary fix + first local pipelines#1195

feat: local Ollama opencode adapters + manifest binary fix + first local pipelines#1195
nextlevelshit wants to merge 13 commits intomainfrom
feat/local-ollama-adapters

nextlevelshit commented Apr 22, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nextlevelshit commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Smoke results (real runs against this host)

Side findings filed for follow-up (not in this PR)

Test plan

Reviewer notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

nextlevelshit commented Apr 22, 2026 •

edited

Loading