Skip to content

fix(plugin)!: invoke lumen via node launcher with lazy binary fetch#154

Open
hypn4 wants to merge 2 commits intoory:mainfrom
hypn4:fix/polyglot-launcher-mjs
Open

fix(plugin)!: invoke lumen via node launcher with lazy binary fetch#154
hypn4 wants to merge 2 commits intoory:mainfrom
hypn4:fix/polyglot-launcher-mjs

Conversation

@hypn4
Copy link
Copy Markdown
Contributor

@hypn4 hypn4 commented May 6, 2026

Summary

Migrates every host's mcpServers / hooks command from scripts/run.cmd (the polyglot whose byte-0 conflict caused #152 and the same regression class superpowers cycled through three patches for) to node ${PLUGIN_ROOT}/launcher.mjs.

This is Phase 2, building on the Phase 1 native-spawn integration test gate in #153. After this PR no live code path reaches scripts/run.cmd; the polyglot remains in the tree as dead code through the soak window before a separate follow-up PR deletes it.

Fixes #152.

Why a Node launcher (industry context)

Three data points show the polyglot pattern is structurally fragile:

The byte 0 of a single file can be exactly one of #, :, or @ — mutually exclusive. No polyglot byte 0 satisfies POSIX posix_spawn, Windows cmd.exe extension dispatch, and shell parsing all at once. Eliminating the polyglot is the only structural fix.

A small Node launcher resolves the OS dispatch in JavaScript and spawns the matching native binary. JS handles platform branching; no shell sits in the loop; the byte-0 conflict goes away. This shares the launcher-resolves-OS-and-spawns-binary shape with @anthropic-ai/claude-code itself, which preinstalls per-platform binaries via npm optionalDependencies rather than lazy-fetching them at first run — the launcher half is the structural-fix common ground; the distribution mechanism (lazy fetch vs preinstall) is the part lumen could migrate to later if you'd prefer.

What launcher.mjs does

  • Resolution chain: $LUMEN_BIN_PATH<pluginRoot>/bin/lumen[.exe] (developer fast path from make build-local) → <cacheDir>/bin/<version>/lumen-<version>-<goos>-<goarch>[.exe] (lazy fetch).
  • Cache dir chain: $CLAUDE_PLUGIN_DATA$CURSOR_PLUGIN_DATA$XDG_DATA_HOME (Unix) / $LOCALAPPDATA (Windows) → ~/.local/share / os.tmpdir() last resort.
  • Lazy fetch from GitHub Releases with redirect-following HTTPS. checksums.txt is parsed and SHA-256 verified before atomic rename — non-zero exit on mismatch.
  • File lock at <cacheDir>/.lock (15-min stale recovery) serializes concurrent first-runs. Tolerates Windows EPERM transient state during another process's unlink.
  • Per-socket idle timeout (60 s) on both request and response streams — hung connections / mid-stream stalls error out instead of blocking the launcher and any host waiting on its handshake.
  • Node platform/arch ↔ Go GOOS/GOARCH mapping (darwin/x64darwin/amd64 etc.) so asset names match goreleaser.
  • prefetch subcommand for cache-warming hooks.
  • $LUMEN_LAUNCHER_VERBOSE enables stderr diagnostics.
  • ~280 lines, Node stdlib only, zero third-party dependencies.

Per-host migration

Host Change
Claude Code (mcpServers + hooks) command: "node", args → [${CLAUDE_PLUGIN_ROOT}/launcher.mjs, …]. SessionStart hook now runs prefetch then session-start, warming the cache before the MCP first invocation.
Cursor (mcp.json + hooks) command: "node", args → launcher path.
OpenCode (.opencode/plugins/lumen.js) spawn target → [process.execPath, launcher, …]. New prefetch() runs synchronously before config.mcp.lumen registration with a 90 s hard timeout cap, sidestepping OpenCode's tight 5 s MCP startup window for cold-cache fetches.
Codex (.codex/INSTALL.md) codex mcp add lumen --startup-timeout-sec 60 -- node "$CODEX_HOME/lumen/launcher.mjs" stdio. Documents Node 18+ prerequisite, PATH-visibility troubleshooting (codex#6243), config.toml persistence, manual prefetch warm-up, and cache directory location for clean uninstalls.

README.md, .opencode/INSTALL.md, .cursor-plugin/INSTALL.md updated likewise.

What this PR does NOT do

  • Delete scripts/run.{cmd,sh,bat} and their tests — waits for soak window (separate follow-up PR).
  • npm optionalDependencies migration for @ory/lumen-opencode — separate effort, requires distribution-packaging decisions.

Verification

CI

Both fork PRs (Phase 1 mirror and this PR's mirror) green across the full matrix: Build (ubuntu / macos / windows-amd64 cross), Test, Lint, E2E, Integration (ubuntu / macos / windows), plus the existing polyglot Script tests as defensive soak coverage.

Local

Tested via claude --plugin-dir <repo> on:

  • Windows 11 25H2 26200.7462
  • macOS Tahoe 26.4.1
  • Ubuntu 24.04.4 LTS (arm)

MCP execution worked correctly on all three; the SessionStart hook ran cleanly, MCP initialize handshake completed, and lumen tools were callable.

Cold-cache lazy-fetch path

Manually verified by removing the developer binary and the existing cache, then running node launcher.mjs prefetch. Launcher fetched the v0.0.39 release asset, SHA-256-verified against checksums.txt, and placed the binary at <cacheDir>/bin/0.0.39/.... Re-invocation was a sub-100 ms cache hit.

Depends on

Fixes #152.

Summary by CodeRabbit

  • New Features

    • Introduces a Node-based cross-platform launcher that lazily fetches, caches, verifies, and runs platform-specific binaries with robust spawn and signal handling.
    • Adds a non-blocking "prefetch" step to warm the binary cache.
  • Documentation

    • Updated install and troubleshooting guidance: required Node version, cache locations, pre-warm step, and recommended startup timeouts.
  • CI

    • New artifact-based build handoff and integration test stage.
  • Tests

    • Added integration tests for prefetch, concurrent prefetch, spawn behavior, and hook interactions.

Adds a Go integration test that mirrors the consumer invocation path
used by Claude Code, Cursor, OpenCode, and Codex MCP hosts: exec.Command
on Unix calls posix_spawn(2); on Windows it calls CreateProcessW. No
shell sits in the loop, so any byte-0 / shebang / extension regression
in the launcher chain fails the test deterministically. The bash-fallback
masking that lets the existing scripts/test_run*.sh tests pass against
broken polyglots no longer hides the failure.

The gate is Phase 1 of the run.cmd polyglot elimination plan: lock the
gate in place before any artifact change, so subsequent migrations are
caught by CI rather than reaching users (the regression class behind
ory#152 and the obra/superpowers v4.3.0 → v4.3.1 → v4.3.2 cycle
referenced in anthropics/claude-plugins-official#1692).

Phase 1 ships:

- launcher.mjs at repo root — minimal Node ESM (~50 lines, no deps)
  that resolves bin/lumen[.exe] from the plugin root or LUMEN_BIN_PATH
  override and spawns it with passed args. Phase 2 will enrich this
  with lazy GitHub Releases fetch + SHA-256 verification + per-host
  cache; Phase 1 keeps it minimal to land the test gate in isolation.

- internal/integration/launcher_spawn_test.go (build tag `integration`):
  - TestLauncherSpawnStdioHandshake drives a real MCP initialize
    handshake via mcp.CommandTransport against `node launcher.mjs stdio`,
    asserting a clean JSON-RPC stream.
  - TestLauncherSpawnHookSessionStart (claude + cursor subtests) runs
    the SessionStart hook subcommand and verifies parseable JSON output
    with clean exit.
  - TestLauncherSpawnHookPreToolUse exercises the second hook path.

- internal/integration/helpers_test.go: repoRoot, launcherPath,
  lookupBuiltBinary, nodePath, sandboxEnv. sandboxTempDir tolerates
  Windows file-locking errors caused by the SessionStart hook's detached
  background indexer (cmd/hook_spawn_windows.go) keeping debug.log /
  index.db open past test end — the test logic completes successfully,
  only t.TempDir's strict cleanup was reporting the residual handle as
  a test failure.

CI additions (.github/workflows/ci.yml):

- build job uploads bin/lumen as artifact lumen-${matrix.os} for the
  integration job to consume.
- build-windows-cross delegates to `make build` so the cross-compile
  recipe lives only in .goreleaser.yml (invoked through the
  oryd/xgoreleaser docker image already used by every release).
  ci.yml does not duplicate apt-get/cp/CC overrides. Extracts the
  windows-amd64 binary from `dist/` and uploads as artifact
  lumen-windows-amd64.
- integration job (matrix [ubuntu-latest, macos-latest, windows-latest])
  depends on both build jobs, downloads the matching artifact, sets up
  Node, restores the execute bit on Unix, and runs `go test
  -tags=integration`.

Existing polyglot script-test steps remain in place — defensive coverage
during the Phase 2 → Phase 3 soak window in case any phase needs revert.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 6, 2026

📝 Walkthrough

Walkthrough

Replaces platform-specific launcher scripts with a Node.js launcher (launcher.mjs) that resolves per-OS/arch native lumen binaries, lazily downloads and verifies them from GitHub Releases with cache and lockfile protection, and spawns the native binary. Configs, hooks, docs, CI, and integration tests were updated to use and validate the new launcher.

Changes

Launcher & Integration

Layer / File(s) Summary
Launcher Implementation
launcher.mjs
New Node.js CLI: resolves LUMEN_BIN_PATH/dev bin/cache, reads pinned version from .release-please-manifest.json, downloads release asset with redirect/idle timeouts, streams SHA-256 verification, writes .partial then atomically renames, sets exec perms, implements .lock with stale-lock reclamation, supports prefetch, hook subcommands, and spawns the native binary with stdio and POSIX signal forwarding.
Plugin Wiring / Prefetch
.opencode/plugins/lumen.js
Replaces script-based MCP command with process.execPath invocation of launcher.mjs ([process.execPath, launcher, "stdio"]), adds a best-effort prefetch() step that spawns launcher.mjs prefetch with a 90s timeout and non-blocking failure semantics, and migrates imports to node: specifiers.
MCP / Hook Configs
.claude-plugin/plugin.json, .cursor/mcp.json, hooks/hooks.json, hooks/hooks-cursor.json
MCP server command/args and hook commands updated to run node/launcher.mjs (first arg) instead of the repo-local scripts/run.* wrappers; stdio transport and hook arguments preserved.
Documentation & Install Guides
.codex/INSTALL.md, .cursor-plugin/INSTALL.md, .opencode/INSTALL.md, README.md
Docs updated to require Node.js 22+, document node ... launcher.mjs usage, increased startup-timeout examples, prefetch step, cache locations, update/uninstall guidance, and troubleshooting for node not found or first-run download timeouts.
CI / Build Artifacts
.github/workflows/ci.yml
build job now uploads bin/lumen artifact; new build-windows-cross job cross-compiles Windows binary and uploads artifact; new integration job downloads appropriate artifacts and runs go test -tags=integration on matrixed platforms.
Integration Tests & Helpers
internal/integration/helpers_test.go, internal/integration/launcher_prefetch_test.go, internal/integration/launcher_spawn_test.go
Adds helpers to locate repo/launcher/node, create sandboxed envs, and resolve built binaries; integration tests exercising prefetch (single and concurrent), checksum verification against GitHub checksums.txt, lock/partial-file cleanup, stdio MCP handshake, and hook subcommand outputs.

Sequence Diagram

sequenceDiagram
    actor Client as Client
    participant Launcher as launcher.mjs
    participant Cache as Local\ Cache
    participant GitHub as GitHub\ Releases
    participant Binary as Native\ Binary

    rect rgba(200,150,255,0.5)
    Note over Launcher,Binary: First run / cache miss
    Client->>Launcher: spawn(stdio | hook)
    Launcher->>Cache: check for cached binary
    Cache-->>Launcher: not found
    Launcher->>GitHub: fetch checksums.txt
    GitHub-->>Launcher: checksums.txt
    Launcher->>GitHub: download asset (stream)
    GitHub-->>Launcher: binary stream
    Launcher->>Launcher: verify SHA-256 and write .partial
    Launcher->>Cache: atomic rename to final path
    Cache-->>Launcher: cached binary path
    Launcher->>Binary: spawn with inherited stdio
    Binary-->>Client: stdio/response
    end

    rect rgba(150,200,150,0.5)
    Note over Launcher,Cache: Cached run (cache hit)
    Client->>Launcher: spawn(stdio | hook)
    Launcher->>Cache: check for cached binary
    Cache-->>Launcher: cached binary path
    Launcher->>Binary: spawn with inherited stdio
    Binary-->>Client: stdio/response
    end

    rect rgba(200,200,150,0.5)
    Note over Client,Launcher: Concurrent prefetch (lock safety)
    Client->>Launcher: prefetch
    Client->>Launcher: prefetch
    Client->>Launcher: prefetch
    Launcher->>Launcher: acquire .lock (winner)
    par Waiters
        Launcher->>GitHub: fetch (waiters sleep/retry)
        Launcher->>GitHub: fetch (waiters sleep/retry)
        Launcher->>GitHub: fetch (winner proceeds)
    end
    Launcher->>Cache: write binary (winner)
    Launcher->>Launcher: release .lock
    Launcher-->>Client: prefetch OK
    Launcher-->>Client: prefetch OK
    Launcher-->>Client: prefetch OK
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 61.11% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: replacing the fragile scripts/run.cmd polyglot with a Node launcher (launcher.mjs) that lazily fetches the lumen binary. The change is specific and captures the key improvement.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@hypn4
Copy link
Copy Markdown
Contributor Author

hypn4 commented May 6, 2026

A couple of things I'd like to flag before review:

The breaking-change marker (fix(plugin)!:) on the manifest commit will trigger a major version bump in release-please semantics — 0.0.39 → 1.0.0. Since the public Go API isn't actually changing here (only the install-path contract), I'm not sure that's the release shape you'd want. Happy to drop the ! and use a BREAKING CHANGE: footer in the commit body instead so this lands as 0.0.40 — just let me know your preference.

Also, this PR intentionally leaves scripts/run.{cmd,sh,bat} and their tests in the tree as dead code rather than deleting them in the same change, so the revert path stays open if anything regresses on real installs. I'd plan to open a small follow-up deletion PR after a short soak — say two tagged releases or 14 days, whichever is longer — but that's flexible; happy to use whatever cadence fits your usual policy.

Comment thread .codex/INSTALL.md Outdated

- [Codex CLI](https://developers.openai.com/codex/cli)
- Git
- **Node.js 18+** — Lumen's MCP entry point is `launcher.mjs`, a small Node
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Node 18 is very old, you should not recommend or require that.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — Node 18 EOL'd in April 2025; recommending it as the floor was wrong. Bumped both .codex/INSTALL.md and .opencode/INSTALL.md to Node.js 22+ (current Active LTS) in the latest force-push. Thanks for the review.

@hypn4 hypn4 force-pushed the fix/polyglot-launcher-mjs branch from 0a64464 to ea32ba9 Compare May 6, 2026 22:50
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (4)
hooks/hooks-cursor.json (1)

4-9: 💤 Low value

Optional: consider mirroring the Claude prefetch step for cold-cache first-run experience.

hooks/hooks.json runs a prefetch hook before the actual session-start to avoid the first session paying the ~30 MB GitHub Releases download cost in the user-visible startup window. Cursor's hook system does not appear to have a documented hard timeout the way OpenCode (5 s) and Codex (10 s default) do, so this may be unnecessary — but a first-run user with a slow link will still see a multi-second pause before semantic_search becomes available. If symmetry with Claude's flow is desired, add a prefetch step before this hook.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@hooks/hooks-cursor.json` around lines 4 - 9, The session-start hook defined
under "sessionStart" currently launches the Cursor plugin via the command node
"${CURSOR_PLUGIN_ROOT}/launcher.mjs" (hook session-start lumen --host cursor);
to mirror Claude's cold-cache prefetch behavior add a preceding "prefetch" hook
entry that runs the same launcher in prefetch mode (i.e., invoke the launcher
with the prefetch action or flag your launcher supports) so the heavy download
happens before user-visible startup; update the hooks-cursor.json to include
this "prefetch" entry before "sessionStart" so semantic_search is primed for
first-run users.
launcher.mjs (2)

162-186: 💤 Low value

Clean up partial file and writable stream on download errors.

The tmpPath cleanup is only run on the SHA-256 mismatch path. If res errors mid-stream (network reset, idle timeout fires after setTimeout at line 130-132), the promise rejects but tmpPath is left on disk and the out writable is not explicitly destroyed. Later runs reuse tmpPath (truncated by createWriteStream), so it is bounded — but explicit cleanup is cheap and avoids stale partials surviving when the destination dir is cache-shared.

♻️ Proposed cleanup on stream error
 async function downloadAndVerify(url, expectedSha256, destPath) {
   const tmpPath = `${destPath}.partial`;
   fs.mkdirSync(path.dirname(destPath), { recursive: true });
   const res = await httpsGet(url);
   const hash = createHash("sha256");
-  await new Promise((resolve, reject) => {
-    const out = fs.createWriteStream(tmpPath);
-    res.on("data", (chunk) => hash.update(chunk));
-    res.on("error", reject);
-    out.on("error", reject);
-    out.on("finish", resolve);
-    res.pipe(out);
-  });
+  try {
+    await new Promise((resolve, reject) => {
+      const out = fs.createWriteStream(tmpPath);
+      const fail = (err) => {
+        res.destroy();
+        out.destroy();
+        reject(err);
+      };
+      res.on("data", (chunk) => hash.update(chunk));
+      res.on("error", fail);
+      out.on("error", fail);
+      out.on("finish", resolve);
+      res.pipe(out);
+    });
+  } catch (e) {
+    try { fs.unlinkSync(tmpPath); } catch {}
+    throw e;
+  }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@launcher.mjs` around lines 162 - 186, The downloadAndVerify function leaves
tmpPath and the writable stream out around network or stream errors; modify the
code so that the writable stream (out) is properly destroyed/closed and tmpPath
is removed whenever the download promise rejects (not just on SHA mismatch).
Concretely, ensure the Promise that wraps res.pipe(out) sets up error handlers
to destroy out and unlink tmpPath on res or out errors, or wrap the await new
Promise in try/catch/finally to call out.destroy() (if created) and
fs.unlinkSync(tmpPath) on error, keeping the SHA-256 check logic intact;
reference downloadAndVerify, tmpPath, res, out and the existing httpsGet call to
locate where to add this cleanup.

289-300: ⚡ Quick win

Forward signals to the child to avoid orphaning the lumen process.

With stdio: "inherit", terminal-driven SIGINT reaches both processes via the foreground process group, so Ctrl+C works. But signals delivered only to the launcher PID (e.g., kill <pid>, host process supervisor sending SIGTERM, IDE shutting down its hook) are not forwarded — the spawned lumen keeps running until it notices stdin closed, leaking a process and its sqlite file lock. Also, child.on("error") and child.on("exit") can both fire; fatal() after process.exit() is harmless but worth guarding for clarity.

♻️ Proposed signal forwarding
   const child = spawn(bin, args, { stdio: "inherit" });
+  const forward = (sig) => {
+    try { child.kill(sig); } catch {}
+  };
+  for (const sig of ["SIGINT", "SIGTERM", "SIGHUP", "SIGQUIT"]) {
+    process.on(sig, () => forward(sig));
+  }
   child.on("exit", (code, signal) => {
     if (signal && process.platform !== "win32") {
       process.kill(process.pid, signal);
     } else {
       process.exit(code ?? (signal ? 1 : 0));
     }
   });
   child.on("error", (err) => {
     fatal(`failed to spawn binary: ${err.message}`, 126);
   });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@launcher.mjs` around lines 289 - 300, The launcher currently spawns the child
(const child = spawn(...)) and only handles child.on("exit")/child.on("error");
add signal forwarding so signals sent to the launcher are forwarded to the child
to avoid orphaning the lumen process: register handlers for SIGINT, SIGTERM,
SIGHUP (and SIGQUIT if desired) that call child.kill(signal) (with Windows-safe
behavior: avoid unsupported signals on win32 or fallback to child.kill() for
termination), and remove those handlers once child emits "exit" to avoid leaks;
also guard the fatal(...) call in child.on("error")/child.on("exit") by ensuring
you don't call fatal after process.exit (use an exited flag or return early in
handlers) so the process flow is clear.
.opencode/plugins/lumen.js (1)

21-36: ⚡ Quick win

Escalate to SIGKILL after the timeout, otherwise a wedged child can hang OpenCode startup forever.

child.kill() sends SIGTERM by default. If launcher.mjs (or the lumen binary it eventually spawns) is hung in a state that ignores or delays SIGTERM, the exit/error events never fire and the awaited promise never resolves — defeating the very 90s cap this code is designed to enforce. Resolve immediately on the timer (the goal is "don't block startup") or chain a SIGKILL after a short grace window.

♻️ Proposed timeout escalation
 async function prefetch() {
   await new Promise((resolve) => {
     const child = spawn(process.execPath, [launcher, "prefetch"], {
       stdio: "inherit",
     });
-    const timer = setTimeout(() => {
-      child.kill();
-    }, PREFETCH_TIMEOUT_MS);
+    let killTimer;
+    const timer = setTimeout(() => {
+      child.kill("SIGTERM");
+      killTimer = setTimeout(() => {
+        try { child.kill("SIGKILL"); } catch {}
+        resolve(); // unblock startup regardless
+      }, 5000);
+    }, PREFETCH_TIMEOUT_MS);
     const done = () => {
       clearTimeout(timer);
+      if (killTimer) clearTimeout(killTimer);
       resolve();
     };
     child.on("exit", done);
     child.on("error", done); // best-effort: don't block opencode startup on fetch failure
   });
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.opencode/plugins/lumen.js around lines 21 - 36, The prefetch promise can
hang because child.kill() sends SIGTERM and the child may ignore it; change the
timeout handler to (a) resolve the outer promise immediately to avoid blocking
startup, and (b) attempt a graceful kill followed by a short escalation window
that sends SIGKILL if the process hasn’t exited; update the timeout closure that
uses PREFETCH_TIMEOUT_MS to call child.kill() (SIGTERM), call resolve(), and set
a secondary timer (e.g., 200–500ms) that calls child.kill("SIGKILL"); ensure you
still clear both timers and remove listeners (the done handler on child.exit and
child.error) when the child finally exits to avoid leaks; target the prefetch
function and the child/timer variables in your change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/ci.yml:
- Around line 164-170: The workflow step "Stage windows binary for upload"
currently copies every lumen.exe found by the find command (find dist -type f
-name 'lumen.exe' -exec cp {} bin/lumen.exe \;) which can select the wrong arch;
change the selection to only the amd64 build by matching the windows amd64
output (e.g., replace the find pattern with one that targets the
dist/*_windows_amd64/lumen.exe path or use find with -path
'*/windows_amd64/lumen.exe') so mkdir -p bin then a single deterministic copy of
the amd64 artifact to bin/lumen.exe is performed and leave the final test -x
bin/lumen.exe as-is.

In `@internal/integration/launcher_prefetch_test.go`:
- Around line 44-45: The test currently ignores errors from
http.NewRequestWithContext and io.ReadAll which can lead to nil requests or
truncated bodies and obscure failures; update the test to capture and check the
error returned by http.NewRequestWithContext (the calls that feed
http.DefaultClient.Do) and fail the test immediately (e.g., t.Fatalf or
require.NoError) if request construction fails, and likewise capture and check
the error from io.ReadAll before using the body (failing the test if reading the
response body errors); apply the same checks to the other occurrences flagged
(the similar NewRequestWithContext and io.ReadAll calls around lines 133-134 and
142-143) so no errors are silently ignored.

---

Nitpick comments:
In @.opencode/plugins/lumen.js:
- Around line 21-36: The prefetch promise can hang because child.kill() sends
SIGTERM and the child may ignore it; change the timeout handler to (a) resolve
the outer promise immediately to avoid blocking startup, and (b) attempt a
graceful kill followed by a short escalation window that sends SIGKILL if the
process hasn’t exited; update the timeout closure that uses PREFETCH_TIMEOUT_MS
to call child.kill() (SIGTERM), call resolve(), and set a secondary timer (e.g.,
200–500ms) that calls child.kill("SIGKILL"); ensure you still clear both timers
and remove listeners (the done handler on child.exit and child.error) when the
child finally exits to avoid leaks; target the prefetch function and the
child/timer variables in your change.

In `@hooks/hooks-cursor.json`:
- Around line 4-9: The session-start hook defined under "sessionStart" currently
launches the Cursor plugin via the command node
"${CURSOR_PLUGIN_ROOT}/launcher.mjs" (hook session-start lumen --host cursor);
to mirror Claude's cold-cache prefetch behavior add a preceding "prefetch" hook
entry that runs the same launcher in prefetch mode (i.e., invoke the launcher
with the prefetch action or flag your launcher supports) so the heavy download
happens before user-visible startup; update the hooks-cursor.json to include
this "prefetch" entry before "sessionStart" so semantic_search is primed for
first-run users.

In `@launcher.mjs`:
- Around line 162-186: The downloadAndVerify function leaves tmpPath and the
writable stream out around network or stream errors; modify the code so that the
writable stream (out) is properly destroyed/closed and tmpPath is removed
whenever the download promise rejects (not just on SHA mismatch). Concretely,
ensure the Promise that wraps res.pipe(out) sets up error handlers to destroy
out and unlink tmpPath on res or out errors, or wrap the await new Promise in
try/catch/finally to call out.destroy() (if created) and fs.unlinkSync(tmpPath)
on error, keeping the SHA-256 check logic intact; reference downloadAndVerify,
tmpPath, res, out and the existing httpsGet call to locate where to add this
cleanup.
- Around line 289-300: The launcher currently spawns the child (const child =
spawn(...)) and only handles child.on("exit")/child.on("error"); add signal
forwarding so signals sent to the launcher are forwarded to the child to avoid
orphaning the lumen process: register handlers for SIGINT, SIGTERM, SIGHUP (and
SIGQUIT if desired) that call child.kill(signal) (with Windows-safe behavior:
avoid unsupported signals on win32 or fallback to child.kill() for termination),
and remove those handlers once child emits "exit" to avoid leaks; also guard the
fatal(...) call in child.on("error")/child.on("exit") by ensuring you don't call
fatal after process.exit (use an exited flag or return early in handlers) so the
process flow is clear.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 183d05cb-fe3b-4ff4-9dfb-7c1bc24a5d7c

📥 Commits

Reviewing files that changed from the base of the PR and between 2c1f63b and ea32ba9.

📒 Files selected for processing (14)
  • .claude-plugin/plugin.json
  • .codex/INSTALL.md
  • .cursor-plugin/INSTALL.md
  • .cursor/mcp.json
  • .github/workflows/ci.yml
  • .opencode/INSTALL.md
  • .opencode/plugins/lumen.js
  • README.md
  • hooks/hooks-cursor.json
  • hooks/hooks.json
  • internal/integration/helpers_test.go
  • internal/integration/launcher_prefetch_test.go
  • internal/integration/launcher_spawn_test.go
  • launcher.mjs

Comment thread .github/workflows/ci.yml
Comment thread internal/integration/launcher_prefetch_test.go Outdated
Migrates every host's mcpServers / hooks command from \`scripts/run.cmd\`
(the polyglot whose byte-0 conflict caused ory#152 and the same
regression class superpowers cycled through three patches for) to
\`node \${PLUGIN_ROOT}/launcher.mjs\`. Phase 2 of the run.cmd polyglot
elimination plan, building on the Phase 1 native-spawn integration
test gate from the preceding commit.

After this commit no live code path reaches \`scripts/run.cmd\`; the
polyglot file remains in the tree as dead code through the soak window
before Phase 3 deletes it.

BREAKING CHANGE: the MCP server command and hook commands now invoke
\`node\` instead of a shell-mediated launcher script. Users on Claude
Code, Cursor, OpenCode, and Codex must have Node 18+ on the spawn PATH.
Most users already do (Claude Code, Cursor, OpenCode all ship Node;
Codex's official MCP examples assume \`npx\`). Codex users on a pure
Homebrew install with no Node should follow the new .codex/INSTALL.md
troubleshooting note.

What launcher.mjs does now:

- Resolution chain: \$LUMEN_BIN_PATH → <pluginRoot>/bin/lumen[.exe]
  (developer fast path) → <cacheDir>/bin/<version>/lumen-<version>-
  <goos>-<goarch>[.exe] (lazy fetch).
- Cache dir: \$CLAUDE_PLUGIN_DATA → \$CURSOR_PLUGIN_DATA →
  \$XDG_DATA_HOME (Unix) / \$LOCALAPPDATA (Windows) → ~/.local/share /
  os.tmpdir().
- Lazy fetch from GitHub Releases on cache miss. checksums.txt parsed,
  SHA-256 verified before atomic rename — non-zero exit on mismatch.
- File lock at <cacheDir>/.lock (15-min stale recovery) serializes
  concurrent first-runs.
- Per-socket idle timeout (60s) on both request and response streams,
  so hung connections / mid-stream stalls error out instead of blocking
  the launcher and any host waiting on its handshake.
- Node platform/arch ↔ Go GOOS/GOARCH mapping
  (darwin/x64 → darwin/amd64 etc.) so asset names match goreleaser.
- \`prefetch\` subcommand for cache-warming hooks; \$LUMEN_LAUNCHER_VERBOSE
  enables stderr diagnostics.
- ~280 lines, Node stdlib only, zero third-party dependencies.

Per-host migration:

- .claude-plugin/plugin.json: command → \${user_config.node_command}
  (default \"node\"), args → [\${PLUGIN_ROOT}/launcher.mjs, stdio]. New
  userConfig.node_command lets users with PATH issues set an absolute
  path without editing the manifest.
- hooks/hooks.json: SessionStart now runs prefetch then session-start,
  warming the cache before the first MCP server invocation. Both hook
  commands use \${user_config.node_command}.
- .cursor/mcp.json + hooks/hooks-cursor.json: command → \"node\", args →
  launcher.mjs path. Cursor's manifest doesn't support
  \${user_config.*} substitution; the absolute-path workaround is
  inline in the docs.
- .opencode/plugins/lumen.js: spawn target swapped to
  [process.execPath, launcher, ...]. New prefetch() runs synchronously
  before config.mcp.lumen registration with a 90s hard timeout cap so a
  wedged launcher can't hang opencode startup. OpenCode's MCP startup
  window (5s documented, 15-30s observed) is too tight for cold-cache
  fetches; pre-warming during config load bypasses it.
- .codex/INSTALL.md: register with \`--startup-timeout-sec 60\` (Codex
  default 10s is borderline tight). Documents Node 18+ prerequisite,
  PATH-visibility troubleshooting (codex#6243 absolute-path workaround),
  config.toml persistence, manual prefetch warm-up, and cache directory
  location for clean uninstalls.
- README.md, .opencode/INSTALL.md, .cursor-plugin/INSTALL.md: install
  snippets updated to launcher.mjs invocation.

New integration tests (build tag \`integration\`):

- TestLauncherPrefetch: downloads the matching release asset, verifies
  SHA-256 against checksums.txt, asserts executable bit, asserts lock
  cleanup; re-runs and asserts cache hit by mtime.
- TestLauncherConcurrentPrefetch: three parallel prefetch processes
  against a shared cache, asserts only one downloads, no .partial
  remnants, all exit 0.
- isolatedLauncher helper copies launcher.mjs + manifest into a fresh
  temp dir without a bin/ subdir so the developer fast path doesn't
  bypass the cache logic the test wants to exercise.
- ensureNetwork helper skips network-dependent tests gracefully when
  github.com is unreachable.

What this commit does NOT do:

- Phase 3 (delete scripts/run.{cmd,sh,bat} + their tests) — waits for
  the soak window (two tagged releases or 14 days).
- npm \`optionalDependencies\` migration for @ory/lumen-opencode —
  separate effort, requires distribution-packaging decisions.

Closes the structural cause of ory#152.
@hypn4 hypn4 force-pushed the fix/polyglot-launcher-mjs branch from ea32ba9 to 93e2e55 Compare May 7, 2026 02:15
@hypn4
Copy link
Copy Markdown
Contributor Author

hypn4 commented May 7, 2026

Addressed in the latest force-push (93e2e55):

Actionable — both applied:

  • ci.yml Windows artifact selection narrowed to a deterministic *windows*amd64*/lumen.exe match with explicit fail-if-empty.
  • launcher_prefetch_test.go request-construction and body-read errors now t.Fatalf per the project coding rule (lines 44, 133, 142).

Nitpicks — applied:

  • launcher.mjs downloadAndVerify now wraps the pipe in try/catch and unlinks the partial file on any mid-stream error.
  • launcher.mjs spawn block now forwards SIGINT/SIGTERM/SIGHUP/SIGQUIT to the child (with a Windows-empty list since Node's win32 signal model is restricted), and guards against the error/exit double-fire by settling once and detaching the signal handlers.
  • .opencode/plugins/lumen.js prefetch timeout now resolves the outer promise immediately when the cap is hit, sends SIGTERM, then escalates to SIGKILL after 500 ms in the background — so a wedged child can't block opencode startup.

Nitpick declinedhooks/hooks-cursor.json prefetch step:

I'd skip this one. Cursor's MCP startup window (using the upstream MCP TypeScript SDK default DEFAULT_REQUEST_TIMEOUT_MSEC) is ~60 s and not configurable, so a cold-cache GitHub Releases fetch (typically 1–5 s, up to ~15 s on slow links) fits comfortably in the existing handshake. The user-visible pause on first session is intrinsic to any cold-cache install and isn't asymmetric to Claude — Claude's prefetch step is there because Claude tightens to 5 s under alwaysLoad: true, which Cursor doesn't have an analogue for. Happy to revisit if maintainers prefer the symmetry.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 7, 2026

Tip

For best results, initiate chat on the files or code changes.

@hypn4 Thanks for the thorough follow-up — all the addressed items look well-handled:

  • ci.yml — deterministic *windows*amd64*/lumen.exe glob with fail-if-empty is exactly the right fix.
  • launcher_prefetch_test.go — promoting request-construction and body-read errors to t.Fatalf is correct.
  • launcher.mjs — the try/catch + partial-file unlink on mid-stream error, the settle-once signal-forwarding guard with SIGINT/SIGTERM/SIGHUP/SIGQUIT (Windows-empty list noted), and the SIGTERM→SIGKILL escalation in the opencode prefetch timeout are all solid.

The reasoning for declining the Cursor prefetch nitpick is technically well-founded — Cursor's DEFAULT_REQUEST_TIMEOUT_MSEC (~60 s, non-configurable) comfortably accommodates a cold-cache fetch, and the asymmetry with Claude is entirely explained by alwaysLoad: true tightening Claude's handshake to ~5 s. Happy to accept that call.

Triggering a re-review of the force-pushed changes.

[review]


✏️ Learnings added
Learnt from: hypn4
URL: https://github.com/ory/lumen/pull/154

Timestamp: 2026-05-07T02:16:32.535Z
Learning: In the `ory/lumen` repository, the Cursor MCP startup window uses `DEFAULT_REQUEST_TIMEOUT_MSEC` from the upstream MCP TypeScript SDK (~60s, not configurable), so a cold-cache GitHub Releases binary fetch (typically 1–15s) fits within it without a prefetch step. Claude Code uses `alwaysLoad: true` which tightens its handshake to ~5s, which is why the Claude integration requires a prefetch step in `hooks/hooks.json` but the Cursor integration in `hooks/hooks-cursor.json` does not.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
.opencode/plugins/lumen.js (1)

21-58: 💤 Low value

Optional: unref() the child and consider piping its stdio.

The prefetch flow is solid (idempotent resolve, SIGTERM→SIGKILL escalation, error/exit both resolve best-effort). Two small, optional refinements:

  1. After timeout-resolves, the surviving child plus the 500ms escalateTimer keep the event loop alive. If opencode shuts down mid-prefetch, this can briefly delay shutdown or leak a child. child.unref() after spawn limits that exposure.
  2. stdio: "inherit" surfaces every prefetch log line into opencode's terminal during plugin load. On cold cache that's a download progress dump on first run; on warm cache it's silent. If the launcher already gates verbosity on $LUMEN_LAUNCHER_VERBOSE, you could pipe and discard by default to keep load quiet.

Both are deferable — current behavior is correct and bounded.

♻️ Sketch of the unref + quieter stdio change
-    const child = spawn(process.execPath, [launcher, "prefetch"], {
-      stdio: "inherit",
-    });
+    const verbose = !!process.env.LUMEN_LAUNCHER_VERBOSE;
+    const child = spawn(process.execPath, [launcher, "prefetch"], {
+      stdio: verbose ? "inherit" : "ignore",
+    });
+    child.unref();
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.opencode/plugins/lumen.js around lines 21 - 58, The prefetch child process
keeps the event loop alive and prints directly to the parent terminal; after
spawning the child in prefetch() (variable child from spawn(..., { stdio:
"inherit" })), call child.unref() so the child won't block shutdown after the
timeout/escalation timers (PREFETCH_TIMEOUT_MS and escalateTimer), and change
stdio from "inherit" to a piped option (e.g., "ignore" or "pipe") so prefetch
logs don't always surface during plugin load—adjust any needed listeners if you
choose "pipe" to preserve debug verbosity via LUMEN_LAUNCHER_VERBOSE.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In @.opencode/plugins/lumen.js:
- Around line 21-58: The prefetch child process keeps the event loop alive and
prints directly to the parent terminal; after spawning the child in prefetch()
(variable child from spawn(..., { stdio: "inherit" })), call child.unref() so
the child won't block shutdown after the timeout/escalation timers
(PREFETCH_TIMEOUT_MS and escalateTimer), and change stdio from "inherit" to a
piped option (e.g., "ignore" or "pipe") so prefetch logs don't always surface
during plugin load—adjust any needed listeners if you choose "pipe" to preserve
debug verbosity via LUMEN_LAUNCHER_VERBOSE.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 2a0dc7a4-747a-4390-b0eb-deebdf157646

📥 Commits

Reviewing files that changed from the base of the PR and between ea32ba9 and 93e2e55.

📒 Files selected for processing (12)
  • .claude-plugin/plugin.json
  • .codex/INSTALL.md
  • .cursor-plugin/INSTALL.md
  • .cursor/mcp.json
  • .github/workflows/ci.yml
  • .opencode/INSTALL.md
  • .opencode/plugins/lumen.js
  • README.md
  • hooks/hooks-cursor.json
  • hooks/hooks.json
  • internal/integration/launcher_prefetch_test.go
  • launcher.mjs
✅ Files skipped from review due to trivial changes (5)
  • hooks/hooks-cursor.json
  • .cursor-plugin/INSTALL.md
  • hooks/hooks.json
  • .opencode/INSTALL.md
  • .claude-plugin/plugin.json
🚧 Files skipped from review as they are similar to previous changes (6)
  • README.md
  • .cursor/mcp.json
  • .github/workflows/ci.yml
  • launcher.mjs
  • internal/integration/launcher_prefetch_test.go
  • .codex/INSTALL.md

@joshmeads
Copy link
Copy Markdown

@hypn4 @jkoenig134 Seems we were both working on the same issue and I forgot to check if anyone else had pushed today. Please see put a comparison and my recommendation here #155 (comment) (yes to many of @hypn4's changes)

My changes are smaller for an initial fix to get MacOS/Unix working again. However, I do think this PR has a lot of value, just many of the extra additions should be split up outside of the main fix PR.

@hypn4 added a strong regression test in #153 (internal/integration/launcher_spawn_test.go) that spawns the launcher via exec.Command (no shell) and runs the full MCP initialize handshake. This is exactly the test that would have caught PR #145 at review time.

Good work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Lumen MCP connection failed

3 participants