Skip to content

feat: local Ollama opencode adapters + manifest binary fix + first local pipelines#1195

Open
nextlevelshit wants to merge 13 commits intomainfrom
feat/local-ollama-adapters
Open

feat: local Ollama opencode adapters + manifest binary fix + first local pipelines#1195
nextlevelshit wants to merge 13 commits intomainfrom
feat/local-ollama-adapters

Conversation

@nextlevelshit
Copy link
Copy Markdown
Collaborator

@nextlevelshit nextlevelshit commented Apr 22, 2026

Summary

Three layers of fix for local-Ollama support on this host:

1. Manifest wiring (b2f6e54f, 925bd5ea): two opencode adapters backed by glm-4.7-flash and qwen3.5:27b. Fixes OpenCodeAdapter so it actually honors the binary: field (previously hardcoded "opencode" lookup, silently ignoring the patched fork needed for <think>-stripping with these models).

2. First local pipelines (5087ef7a, be09618e): three single-step pipelines (local-file-explain, local-naming-check, local-test-gen) sized to the local-model envelope. All run end-to-end on opencode-glm.

3. Architectural fixes surfaced along the way (530b150d, 667dd567, cb1619ee, 630f281a, e340ef62):

  • Schema-sync test mirror for the new contract
  • Zombie reaper for wave status — was accumulating "running" rows forever once a process died abnormally
  • Persona sweep: 11 step assignments in 10 pipelines that pinned navigator (read-only) but told the model to use Write
  • New Step.Permissions overlay for legitimate per-step capability bumps without picking a different persona
  • wave validate now flags any prompt mentioning a tool the persona doesn't grant (warn-or-error, configurable). Surfaced 60 more mismatches across 31 pipelines that were silently broken — fixable in follow-up PRs

Smoke results (real runs against this host)

Adapter selection — six candidates against ops-hello-world:

Adapter Result Notes
opencode-qwen (qwen3.5:27b, patched) PASS 292s Selected
opencode-glm (glm-4.7-flash, patched) PASS 93s Selected
qwen3-coder-30b GGUF FAIL Raw <tool_call> text
qwen3-8b-q8-0 FAIL Same
qwen3-32b-q4-k-m FAIL Same
mistral:7b FAIL Review JSON unparseable
gemma4 FAIL No artifact emitted

Local pipelines on opencode-glm:

Pipeline Result Notes
local-file-explain ✓ Markdown summary 285s, 18.6k tok
local-naming-check ✓ schema-clean JSON 340s, 25.2k tok, 6 findings (model false positives — known limit)
local-test-gen ✓ valid Go test shape 896s, 21.4k tok, body needs human cleanup

Side findings filed for follow-up (not in this PR)

  • 50 additional pipelines flagged by the new validator — mostly Task (sub-agent dispatch) and Write/Edit mentions in non-Write personas. Sweep next.
  • audit-junk-code.yaml uses on_failure: skip + lenient JSON recovery. Broken artifacts pass silently. Now diagnosable via the new validator.
  • GLM hallucinates specs/999-<branch>/<file> subdirs from project's existing specs/ tree. Qwen does it too. Real fix is path-strict artifact extraction, not prompt eng.
  • Heartbeats vs reaper bandaid for zombie runs. Reaper covers worst case; heartbeats would give real-time death detection but is not urgent.

Test plan

  • go test ./... (all packages pass locally)
  • go vet ./...
  • nix shell nixpkgs#golangci-lint --command golangci-lint run ./... (0 issues)
  • go run ./cmd/wave validate (passes)
  • go run ./cmd/wave validate --all (60 mismatches reported; warn-or-error configurable)
  • go run ./cmd/wave run ops-hello-world ... --adapter opencode-glm
  • go run ./cmd/wave run local-naming-check internal/ontology --adapter opencode-glm --detach --verbose
  • go run ./cmd/wave run local-test-gen ... --adapter opencode-glm --detach --verbose
  • wave status reaper verified live (5 zombie runs swept on first call)

Reviewer notes

  • The patched-binary swap defaults to opencode-patched for the new adapters — alternative is to ship as opt-in only.
  • The Step.Permissions overlay is additive on AllowedTools and Deny-rules-win to preserve persona-level prohibitions.
  • The new validator defaults to hard error; CI today runs only wave validate (not --all), so it does not block on the 50 surfaced cases. Use --prompt-tools-warn or WAVE_PROMPT_TOOLS_WARN=1 to downgrade if a follow-up PR wants --all in CI without first fixing all 31 pipelines.

Replace stale opencode-local with two smoke-tested adapters for the
local Ollama server on this host:

- opencode-glm  → ollama/glm-4.7-flash (stock opencode binary)
- opencode-qwen → ollama/qwen3.5:27b (opencode-patched binary)

Models picked from the host's working set after running ops-hello-world
against six candidates; only these two passed end-to-end. qwen requires
the patched binary's <think>-block stripper.

Also ignore .wave/ runtime logs/retros — ephemeral, mirrors existing
.agents/logs handling.
`NewOpenCodeAdapter` previously hardcoded the `opencode` binary lookup,
silently ignoring the per-adapter `binary:` field declared in
wave.yaml. As a result, forks like `opencode-patched` (which adds
<think>-block stripping needed for GLM/qwen reliability) could be
declared in the manifest but never actually executed.

- Add `NewOpenCodeAdapterWithBinary(binary string)` that resolves the
  given name via PATH, with empty defaulting to `opencode`.
- Add `ResolveAdapterWithBinary(name, binary string)` overload; the
  one-arg `ResolveAdapter` keeps its existing semantics.
- Thread the binary through `AdapterRegistry` via a new `SetBinary`
  method and a `binaries` map consulted during `Resolve`.
- Populate the registry from `manifest.Adapters[*].Binary` in
  `wave run`, so opencode-* adapters now invoke the configured fork.

Also flips `opencode-glm` in the project manifest to opencode-patched
now that the field is wired.
Local Ollama models on the kraftwerk host (qwen3.5:27b, glm-4.7-flash)
struggle with the existing audit pipelines: prompts assume frontier-class
context windows, multi-step tool reasoning, and strict JSON schemas. The
new pipelines target the local-model envelope — single step, narrow
scope, simple output — so they actually finish and produce usable
artifacts on this host.

- local-file-explain: read one source file, emit a Markdown summary.
- local-naming-check: scan a Go directory for naming inconsistencies,
  emit a JSON finding list against a small contract schema.
- local-test-gen: generate a table-driven Go test stub for a single
  function, no external deps.

All three use the implementer persona (Read+Write+Edit+Bash+Glob+Grep)
because the previously-default navigator persona is read-only and
silently bottlenecks any pipeline that asks the model to Write outputs.

Pipelines fail-loud (`on_failure: fail`) so broken artifacts surface
immediately rather than slipping through `on_failure: skip`.
Initial run on opencode-glm wrote the summary to a hallucinated
specs/999-feat-local-ollama-adapters/ subdirectory (model picked up
the branch name and inferred a spec convention from the project's
existing specs/ tree). Wave's recursive artifact discovery still
picked it up, but the path was wrong.

- Switch persona from implementer (description triggers spec-aware
  behaviour) to craftsman.
- Tighten the prompt: explicit ./ prefix, explicit ban on subdirs and
  specs/.

Pipeline still completes in ~5 min and produces a valid Markdown
summary.
TestSchemaSync requires every schema in .agents/contracts/ to have a
byte-identical copy in internal/defaults/contracts/ — the former is
authoritative, the latter is what the embedded binary ships.

Adding the local-naming-check pipeline in the previous commit added the
authoritative copy without the embedded mirror, breaking CI.
`wave status` listed runs as "running" indefinitely once the underlying
process died — killed wrappers, abandoned go-run sessions, OOM kills
all left the DB record stuck. Saw it firsthand: the status table showed
five runs as running, three of them more than 13 hours old, with zero
tokens and no live PIDs anywhere on the host.

- Add `reconcileZombies` to the running-runs path in status.go: walk
  each record, mark as failed if the tracked process is gone or, when
  there is no PID, if started_at is older than one hour.
- For PID > 0 runs use `syscall.Kill(pid, 0)` and treat ESRCH as
  conclusive proof the process has gone.
- Reaped runs are dropped from the live list and persisted to the DB
  with status="failed" and current_step="process gone (orphaned)" so
  later `--all` queries reflect what actually happened.

Three table-driven tests cover: PID==0 stale → reaped, PID==0 fresh →
survives, PID>>max-pid → reaped.
Ten pipelines pinned the read-only `navigator` persona on steps that
asked the model to use the Write tool. Without Write permission the
model emitted preamble text that was silently rescued by Wave's
lenient artifact discovery, masking real bugs.

Each step is reassigned to the smallest persona that satisfies its
prompt, mirrored in both internal/defaults/pipelines/ and
.agents/pipelines/. Navigator usage in agent_review contracts (which
are genuinely read-only) is preserved.

Per-step changes (navigator -> new):
- audit-closed.yaml          triage           -> craftsman
- audit-dead-code-review.yaml scan            -> implementer
- doc-fix.yaml               scan-changes     -> implementer
- impl-feature.yaml          explore          -> implementer
- impl-hotfix.yaml           investigate      -> implementer
- impl-prototype.yaml        pr-create        -> craftsman
- impl-prototype.yaml        pr-merge         -> craftsman
- impl-recinq.yaml           report           -> craftsman
- impl-refactor.yaml         analyze          -> implementer
- ops-hello-world.yaml       verify/review    -> craftsman
- plan-adr.yaml              explore-context  -> implementer
Today, persona permissions are the only source of truth for step tool
grants, but several pipelines pin a read-only persona (navigator) and
then expect the agent to Write — silently broken at runtime, occasionally
rescued by lenient artifact discovery.

Add Step.Permissions (manifest.Permissions) with additive merge semantics
via a new ResolveStepPermissions(step, persona, adapterDef) resolver:

  - AllowedTools is unioned across step, persona, and adapter defaults
    (step entries appear first; duplicates collapse)
  - Deny is also unioned. Persona-level deny rules survive the merge,
    so adapter.PermissionChecker's deny-first precedence still wins —
    a step cannot remove a persona-level prohibition

Wire the resolver into the executor's adapter run config and the two
forge preflight scan sites so a step-level Write override does not get
mis-flagged by the dependency scanner. Meta pipeline invocation is left
alone (no Step in scope at that call site).

Demo the override on audit-junk-code.yaml: keep the navigator persona
but grant the scan step Write/Edit/Bash so it can emit findings without
adopting a heavier persona.

Tests cover empty step, additive Write, deny-wins-after-merge, adapter
defaults fallback, duplicate collapse, and nil safety.
Adds a new check to `wave validate` that scans every prompt-type step for
mentions of Claude tool names (Write, Edit, Bash(...), WebFetch, WebSearch,
Read, Glob, Grep, Task, MultiEdit) and flags any step whose resolved persona
does not grant the mentioned tool. This catches the common bug where a
pipeline pins a read-only persona (e.g. navigator) but its prompt says
"use the Write tool" -- the model cannot comply, produces preamble text,
and Wave's lenient artifact discovery occasionally rescues it.

Heuristic:
- High-precision regexes per tool (Tool(...) / "Tool tool" / "Tool the/a/an/
  to/back/into/out/for" / "use Tool"); bare matches for WebFetch/WebSearch/Task.
- Persona's parameterised grants (e.g. Bash(git log*)) reduce to base name
  for comparison, so a persona granted Bash(git log*) satisfies a "Bash"
  mention.
- Composition steps (gates, branches, loops, sub-pipelines, command/
  conditional steps) skipped -- they do not run a model.
- Forge templates ({{ forge.type }}-analyst) resolve via known forge list.

Hard error by default; --prompt-tools-warn flag or WAVE_PROMPT_TOOLS_WARN=1
env var downgrades to warning for prompts that legitimately mention a tool
in prose.

Includes table-driven tests for the heuristic, the persona resolver, and
end-to-end CLI wiring (hard error vs warn). Documents the false-positive
risk: any prose mention of a tool name will flag (e.g. "do NOT use WebFetch
here" still surfaces a finding) -- the fix is to either rephrase or use
warn mode.
Drove `wave validate --all` mismatch count from 60 to 0 across 29
pipelines. The validator (e340ef6) flagged steps where the prompt
mentions a tool the resolved persona does not grant — both genuine
runtime bugs (navigator pinned on a Write step, validator pinned on
a JSON-emitting step) and prose false positives where "task", "Task
subagents", "WebSearch", or "Edit" appeared in incidental wording.

Breakdown:
  - swap-persona  (10): replaced read-only persona with the smallest
                       write-capable one — extends the cb1619e sweep
                       to converge/reproduce/investigate/etc steps
  - prose-rewrite (29): rephrased incidental tool-name mentions
                       (e.g. "Task subagents" → "sub-agents", "task
                       list" → "checklist", "Write the report" →
                       "Save/Compose the report")
  - step-overlay   (0): the validator does not yet honour
                       Step.Permissions, so overlays cannot silence
                       findings; deferred until validator integration

Per-pipeline action:
  audit-closed             swap-persona  (audit: analyst → implementer)
  audit-consolidate        prose-rewrite ("step vs task vs job")
  audit-dead-code-issue    swap + prose  (compose-report nav→craft)
  audit-dead-code-review   prose-rewrite ("Task subagents")
  audit-dead-code          prose-rewrite ("Task subagents")
  audit-doc                swap + prose  (compose-report nav→craft)
  audit-dx                 prose-rewrite ("for a given task")
  audit-security           prose-rewrite ("write a proof-of-concept")
  audit-ux                 prose-rewrite ("most common task")
  bench-solve              prose-rewrite (SWE-bench wording, file-search tooling)
  doc-fix                  prose-rewrite ("Task subagents")
  doc-onboard              prose-rewrite ("write an onboarding", "Task-oriented")
  full-impl-cycle          swap-persona  (merge-findings nav→craft)
  impl-issue / -core       prose-rewrite (shared implement/* prompts)
  impl-recinq              swap + prose  (converge validator→synthesizer)
  impl-smart-route         swap + prose  (assess nav→craftsman)
  impl-speckit             prose-rewrite (shared speckit-flow/* prompts)
  ops-debug                swap + prose  (reproduce debugger→implementer)
  ops-implement-epic       prose-rewrite ("task lists")
  ops-refresh              prose-rewrite ("edit the issue")
  ops-rewrite              prose-rewrite ("edit the same")
  ops-supervise            prose-rewrite ("stay on task", "Edit vs Write")
  plan-scope               prose-rewrite ("single task")
  plan-task                prose-rewrite (full task→item rewrite throughout)
  wave-bugfix              swap + prose  (investigate nav→implementer)
  wave-orchestrate         swap + prose  (classify nav→craftsman)
  wave-smoke-llm-judge     swap + prose  (review nav→craftsman)
  wave-test-forge          swap + prose  (detect-forge, synthesize nav→craft)

All edits mirrored into internal/defaults/pipelines/ and
internal/defaults/prompts/ where a defaults copy exists. The
audit-junk-code.yaml step-overlay demo from 630f281 is untouched.

Before: 60 prompt/tool mismatches across 29 pipelines
After:   0 mismatches; `go test ./...` and `golangci-lint run ./...`
         both clean
The prompt/tool validator (commit `e340ef62`) read tools only from
the persona, ignoring the per-step `Step.Permissions` overlay added
in `630f281a`. A pipeline that grants Write to a single navigator-
pinned step via the overlay was still flagged as a mismatch, defeating
the point of having an overlay.

- Switch to `pipeline.ResolveStepPermissions(&step, persona, nil)` so
  the validator sees the merged AllowedTools from step + persona +
  adapter (adapter is nil at validate time — it isn't selected yet).
- New table case proves the composition: navigator persona, prompt
  asks for Write, step overlay grants Write — produces no findings.
- `wave validate --all` still passes (0 mismatches across the now-
  swept defaults).
The audit-junk-code handover used `on_failure: skip` plus the default
lenient JSON recovery. A model that produced 239 bytes of preamble text
instead of structured findings was silently rescued — the run logged
"completed successfully" with a junk artifact.

Flip to `on_failure: fail` so the contract is the gate it claims to be.
This is paired with the prompt/tool validator (`e340ef62`) and the
default-pipeline sweep (`df7fb6fd`): the upstream causes of model
non-compliance are now caught at validate time, so flipping to fail
won't cause new false-positive failures.

CI hardening (`go run ./cmd/wave validate --all` step in release.yml)
needs to land separately — the OAuth token used here lacks `workflow`
scope. Run `gh auth refresh -s workflow` and add this step under
`Run tests` in .github/workflows/release.yml:

  - name: Validate manifest + all pipelines
    run: go run ./cmd/wave validate --all
Local-Ollama runs occasionally produce files in unexpected locations:
glm-4.7-flash and qwen3.5:27b both wrote summary outputs into a
hallucinated `specs/999-<branch>/<file>` subdirectory because the
mounted project happened to contain a `specs/` tree. The canonical
declared path was also written so contracts passed, but the divergence
went unnoticed and the workspaces accumulated cruft.

Add a best-effort end-of-step walk that emits a single warning event
listing any files the persona wrote outside its declared
`output_artifacts` paths. Wave-managed subtrees (.agents/, project/,
.git/, vendor/, node_modules/) and standard adapter drops
(AGENTS.md, CLAUDE.md, dotfiles) are pruned to keep the warning
focused on real drift.

Walk errors are swallowed by design — observability must not introduce
a new failure mode. Output is capped to the first 5 paths plus a
`(+N more)` summary to keep logs scannable.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant