feat: local Ollama opencode adapters + manifest binary fix + first local pipelines#1195
Open
nextlevelshit wants to merge 13 commits intomainfrom
Open
feat: local Ollama opencode adapters + manifest binary fix + first local pipelines#1195nextlevelshit wants to merge 13 commits intomainfrom
nextlevelshit wants to merge 13 commits intomainfrom
Conversation
Replace stale opencode-local with two smoke-tested adapters for the local Ollama server on this host: - opencode-glm → ollama/glm-4.7-flash (stock opencode binary) - opencode-qwen → ollama/qwen3.5:27b (opencode-patched binary) Models picked from the host's working set after running ops-hello-world against six candidates; only these two passed end-to-end. qwen requires the patched binary's <think>-block stripper. Also ignore .wave/ runtime logs/retros — ephemeral, mirrors existing .agents/logs handling.
`NewOpenCodeAdapter` previously hardcoded the `opencode` binary lookup, silently ignoring the per-adapter `binary:` field declared in wave.yaml. As a result, forks like `opencode-patched` (which adds <think>-block stripping needed for GLM/qwen reliability) could be declared in the manifest but never actually executed. - Add `NewOpenCodeAdapterWithBinary(binary string)` that resolves the given name via PATH, with empty defaulting to `opencode`. - Add `ResolveAdapterWithBinary(name, binary string)` overload; the one-arg `ResolveAdapter` keeps its existing semantics. - Thread the binary through `AdapterRegistry` via a new `SetBinary` method and a `binaries` map consulted during `Resolve`. - Populate the registry from `manifest.Adapters[*].Binary` in `wave run`, so opencode-* adapters now invoke the configured fork. Also flips `opencode-glm` in the project manifest to opencode-patched now that the field is wired.
Local Ollama models on the kraftwerk host (qwen3.5:27b, glm-4.7-flash) struggle with the existing audit pipelines: prompts assume frontier-class context windows, multi-step tool reasoning, and strict JSON schemas. The new pipelines target the local-model envelope — single step, narrow scope, simple output — so they actually finish and produce usable artifacts on this host. - local-file-explain: read one source file, emit a Markdown summary. - local-naming-check: scan a Go directory for naming inconsistencies, emit a JSON finding list against a small contract schema. - local-test-gen: generate a table-driven Go test stub for a single function, no external deps. All three use the implementer persona (Read+Write+Edit+Bash+Glob+Grep) because the previously-default navigator persona is read-only and silently bottlenecks any pipeline that asks the model to Write outputs. Pipelines fail-loud (`on_failure: fail`) so broken artifacts surface immediately rather than slipping through `on_failure: skip`.
Initial run on opencode-glm wrote the summary to a hallucinated specs/999-feat-local-ollama-adapters/ subdirectory (model picked up the branch name and inferred a spec convention from the project's existing specs/ tree). Wave's recursive artifact discovery still picked it up, but the path was wrong. - Switch persona from implementer (description triggers spec-aware behaviour) to craftsman. - Tighten the prompt: explicit ./ prefix, explicit ban on subdirs and specs/. Pipeline still completes in ~5 min and produces a valid Markdown summary.
TestSchemaSync requires every schema in .agents/contracts/ to have a byte-identical copy in internal/defaults/contracts/ — the former is authoritative, the latter is what the embedded binary ships. Adding the local-naming-check pipeline in the previous commit added the authoritative copy without the embedded mirror, breaking CI.
`wave status` listed runs as "running" indefinitely once the underlying process died — killed wrappers, abandoned go-run sessions, OOM kills all left the DB record stuck. Saw it firsthand: the status table showed five runs as running, three of them more than 13 hours old, with zero tokens and no live PIDs anywhere on the host. - Add `reconcileZombies` to the running-runs path in status.go: walk each record, mark as failed if the tracked process is gone or, when there is no PID, if started_at is older than one hour. - For PID > 0 runs use `syscall.Kill(pid, 0)` and treat ESRCH as conclusive proof the process has gone. - Reaped runs are dropped from the live list and persisted to the DB with status="failed" and current_step="process gone (orphaned)" so later `--all` queries reflect what actually happened. Three table-driven tests cover: PID==0 stale → reaped, PID==0 fresh → survives, PID>>max-pid → reaped.
Ten pipelines pinned the read-only `navigator` persona on steps that asked the model to use the Write tool. Without Write permission the model emitted preamble text that was silently rescued by Wave's lenient artifact discovery, masking real bugs. Each step is reassigned to the smallest persona that satisfies its prompt, mirrored in both internal/defaults/pipelines/ and .agents/pipelines/. Navigator usage in agent_review contracts (which are genuinely read-only) is preserved. Per-step changes (navigator -> new): - audit-closed.yaml triage -> craftsman - audit-dead-code-review.yaml scan -> implementer - doc-fix.yaml scan-changes -> implementer - impl-feature.yaml explore -> implementer - impl-hotfix.yaml investigate -> implementer - impl-prototype.yaml pr-create -> craftsman - impl-prototype.yaml pr-merge -> craftsman - impl-recinq.yaml report -> craftsman - impl-refactor.yaml analyze -> implementer - ops-hello-world.yaml verify/review -> craftsman - plan-adr.yaml explore-context -> implementer
Today, persona permissions are the only source of truth for step tool
grants, but several pipelines pin a read-only persona (navigator) and
then expect the agent to Write — silently broken at runtime, occasionally
rescued by lenient artifact discovery.
Add Step.Permissions (manifest.Permissions) with additive merge semantics
via a new ResolveStepPermissions(step, persona, adapterDef) resolver:
- AllowedTools is unioned across step, persona, and adapter defaults
(step entries appear first; duplicates collapse)
- Deny is also unioned. Persona-level deny rules survive the merge,
so adapter.PermissionChecker's deny-first precedence still wins —
a step cannot remove a persona-level prohibition
Wire the resolver into the executor's adapter run config and the two
forge preflight scan sites so a step-level Write override does not get
mis-flagged by the dependency scanner. Meta pipeline invocation is left
alone (no Step in scope at that call site).
Demo the override on audit-junk-code.yaml: keep the navigator persona
but grant the scan step Write/Edit/Bash so it can emit findings without
adopting a heavier persona.
Tests cover empty step, additive Write, deny-wins-after-merge, adapter
defaults fallback, duplicate collapse, and nil safety.
Adds a new check to `wave validate` that scans every prompt-type step for
mentions of Claude tool names (Write, Edit, Bash(...), WebFetch, WebSearch,
Read, Glob, Grep, Task, MultiEdit) and flags any step whose resolved persona
does not grant the mentioned tool. This catches the common bug where a
pipeline pins a read-only persona (e.g. navigator) but its prompt says
"use the Write tool" -- the model cannot comply, produces preamble text,
and Wave's lenient artifact discovery occasionally rescues it.
Heuristic:
- High-precision regexes per tool (Tool(...) / "Tool tool" / "Tool the/a/an/
to/back/into/out/for" / "use Tool"); bare matches for WebFetch/WebSearch/Task.
- Persona's parameterised grants (e.g. Bash(git log*)) reduce to base name
for comparison, so a persona granted Bash(git log*) satisfies a "Bash"
mention.
- Composition steps (gates, branches, loops, sub-pipelines, command/
conditional steps) skipped -- they do not run a model.
- Forge templates ({{ forge.type }}-analyst) resolve via known forge list.
Hard error by default; --prompt-tools-warn flag or WAVE_PROMPT_TOOLS_WARN=1
env var downgrades to warning for prompts that legitimately mention a tool
in prose.
Includes table-driven tests for the heuristic, the persona resolver, and
end-to-end CLI wiring (hard error vs warn). Documents the false-positive
risk: any prose mention of a tool name will flag (e.g. "do NOT use WebFetch
here" still surfaces a finding) -- the fix is to either rephrase or use
warn mode.
Drove `wave validate --all` mismatch count from 60 to 0 across 29 pipelines. The validator (e340ef6) flagged steps where the prompt mentions a tool the resolved persona does not grant — both genuine runtime bugs (navigator pinned on a Write step, validator pinned on a JSON-emitting step) and prose false positives where "task", "Task subagents", "WebSearch", or "Edit" appeared in incidental wording. Breakdown: - swap-persona (10): replaced read-only persona with the smallest write-capable one — extends the cb1619e sweep to converge/reproduce/investigate/etc steps - prose-rewrite (29): rephrased incidental tool-name mentions (e.g. "Task subagents" → "sub-agents", "task list" → "checklist", "Write the report" → "Save/Compose the report") - step-overlay (0): the validator does not yet honour Step.Permissions, so overlays cannot silence findings; deferred until validator integration Per-pipeline action: audit-closed swap-persona (audit: analyst → implementer) audit-consolidate prose-rewrite ("step vs task vs job") audit-dead-code-issue swap + prose (compose-report nav→craft) audit-dead-code-review prose-rewrite ("Task subagents") audit-dead-code prose-rewrite ("Task subagents") audit-doc swap + prose (compose-report nav→craft) audit-dx prose-rewrite ("for a given task") audit-security prose-rewrite ("write a proof-of-concept") audit-ux prose-rewrite ("most common task") bench-solve prose-rewrite (SWE-bench wording, file-search tooling) doc-fix prose-rewrite ("Task subagents") doc-onboard prose-rewrite ("write an onboarding", "Task-oriented") full-impl-cycle swap-persona (merge-findings nav→craft) impl-issue / -core prose-rewrite (shared implement/* prompts) impl-recinq swap + prose (converge validator→synthesizer) impl-smart-route swap + prose (assess nav→craftsman) impl-speckit prose-rewrite (shared speckit-flow/* prompts) ops-debug swap + prose (reproduce debugger→implementer) ops-implement-epic prose-rewrite ("task lists") ops-refresh prose-rewrite ("edit the issue") ops-rewrite prose-rewrite ("edit the same") ops-supervise prose-rewrite ("stay on task", "Edit vs Write") plan-scope prose-rewrite ("single task") plan-task prose-rewrite (full task→item rewrite throughout) wave-bugfix swap + prose (investigate nav→implementer) wave-orchestrate swap + prose (classify nav→craftsman) wave-smoke-llm-judge swap + prose (review nav→craftsman) wave-test-forge swap + prose (detect-forge, synthesize nav→craft) All edits mirrored into internal/defaults/pipelines/ and internal/defaults/prompts/ where a defaults copy exists. The audit-junk-code.yaml step-overlay demo from 630f281 is untouched. Before: 60 prompt/tool mismatches across 29 pipelines After: 0 mismatches; `go test ./...` and `golangci-lint run ./...` both clean
The prompt/tool validator (commit `e340ef62`) read tools only from the persona, ignoring the per-step `Step.Permissions` overlay added in `630f281a`. A pipeline that grants Write to a single navigator- pinned step via the overlay was still flagged as a mismatch, defeating the point of having an overlay. - Switch to `pipeline.ResolveStepPermissions(&step, persona, nil)` so the validator sees the merged AllowedTools from step + persona + adapter (adapter is nil at validate time — it isn't selected yet). - New table case proves the composition: navigator persona, prompt asks for Write, step overlay grants Write — produces no findings. - `wave validate --all` still passes (0 mismatches across the now- swept defaults).
The audit-junk-code handover used `on_failure: skip` plus the default
lenient JSON recovery. A model that produced 239 bytes of preamble text
instead of structured findings was silently rescued — the run logged
"completed successfully" with a junk artifact.
Flip to `on_failure: fail` so the contract is the gate it claims to be.
This is paired with the prompt/tool validator (`e340ef62`) and the
default-pipeline sweep (`df7fb6fd`): the upstream causes of model
non-compliance are now caught at validate time, so flipping to fail
won't cause new false-positive failures.
CI hardening (`go run ./cmd/wave validate --all` step in release.yml)
needs to land separately — the OAuth token used here lacks `workflow`
scope. Run `gh auth refresh -s workflow` and add this step under
`Run tests` in .github/workflows/release.yml:
- name: Validate manifest + all pipelines
run: go run ./cmd/wave validate --all
Local-Ollama runs occasionally produce files in unexpected locations: glm-4.7-flash and qwen3.5:27b both wrote summary outputs into a hallucinated `specs/999-<branch>/<file>` subdirectory because the mounted project happened to contain a `specs/` tree. The canonical declared path was also written so contracts passed, but the divergence went unnoticed and the workspaces accumulated cruft. Add a best-effort end-of-step walk that emits a single warning event listing any files the persona wrote outside its declared `output_artifacts` paths. Wave-managed subtrees (.agents/, project/, .git/, vendor/, node_modules/) and standard adapter drops (AGENTS.md, CLAUDE.md, dotfiles) are pruned to keep the warning focused on real drift. Walk errors are swallowed by design — observability must not introduce a new failure mode. Output is capped to the first 5 paths plus a `(+N more)` summary to keep logs scannable.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three layers of fix for local-Ollama support on this host:
1. Manifest wiring (
b2f6e54f,925bd5ea): two opencode adapters backed by glm-4.7-flash and qwen3.5:27b. FixesOpenCodeAdapterso it actually honors thebinary:field (previously hardcoded "opencode" lookup, silently ignoring the patched fork needed for<think>-stripping with these models).2. First local pipelines (
5087ef7a,be09618e): three single-step pipelines (local-file-explain,local-naming-check,local-test-gen) sized to the local-model envelope. All run end-to-end on opencode-glm.3. Architectural fixes surfaced along the way (
530b150d,667dd567,cb1619ee,630f281a,e340ef62):wave status— was accumulating "running" rows forever once a process died abnormallynavigator(read-only) but told the model to use WriteStep.Permissionsoverlay for legitimate per-step capability bumps without picking a different personawave validatenow flags any prompt mentioning a tool the persona doesn't grant (warn-or-error, configurable). Surfaced 60 more mismatches across 31 pipelines that were silently broken — fixable in follow-up PRsSmoke results (real runs against this host)
Adapter selection — six candidates against
ops-hello-world:<tool_call>textLocal pipelines on opencode-glm:
Side findings filed for follow-up (not in this PR)
Task(sub-agent dispatch) and Write/Edit mentions in non-Write personas. Sweep next.audit-junk-code.yamluseson_failure: skip+ lenient JSON recovery. Broken artifacts pass silently. Now diagnosable via the new validator.specs/999-<branch>/<file>subdirs from project's existingspecs/tree. Qwen does it too. Real fix is path-strict artifact extraction, not prompt eng.Test plan
go test ./...(all packages pass locally)go vet ./...nix shell nixpkgs#golangci-lint --command golangci-lint run ./...(0 issues)go run ./cmd/wave validate(passes)go run ./cmd/wave validate --all(60 mismatches reported; warn-or-error configurable)go run ./cmd/wave run ops-hello-world ... --adapter opencode-glmgo run ./cmd/wave run local-naming-check internal/ontology --adapter opencode-glm --detach --verbosego run ./cmd/wave run local-test-gen ... --adapter opencode-glm --detach --verbosewave statusreaper verified live (5 zombie runs swept on first call)Reviewer notes
opencode-patchedfor the new adapters — alternative is to ship as opt-in only.Step.Permissionsoverlay is additive onAllowedToolsand Deny-rules-win to preserve persona-level prohibitions.wave validate(not--all), so it does not block on the 50 surfaced cases. Use--prompt-tools-warnorWAVE_PROMPT_TOOLS_WARN=1to downgrade if a follow-up PR wants--allin CI without first fixing all 31 pipelines.