Skip to content

[wrangler] Fix races in RemoteProxySession reload and remote tail WebSocket#13867

Open
petebacondarwin wants to merge 5 commits intomainfrom
pbd/13831-remote-proxy-reload-fixes
Open

[wrangler] Fix races in RemoteProxySession reload and remote tail WebSocket#13867
petebacondarwin wants to merge 5 commits intomainfrom
pbd/13831-remote-proxy-reload-fixes

Conversation

@petebacondarwin
Copy link
Copy Markdown
Contributor

Fixes #13831

Cherry-picks four commits from pbd/agent-memory that fix underlying race conditions in wrangler dev's remote-binding infrastructure. These contribute to flakiness in the miniflare-remote-resources.test.ts e2e suite (and to user-visible flakes in wrangler dev).

Changes

  1. RemoteProxySession.updateBindings reload race — Previously resolved as soon as worker.patchConfig dispatched the configUpdate event, long before the remote worker had been re-uploaded and the local proxy worker had unpaused. Callers issuing requests immediately afterwards saw flaky failures (typically "WebSocket connection failed" for JSRPC bindings like service bindings or dispatch namespaces). updateBindings now waits for the next reloadComplete event and drains the local proxy worker's runtime-message mutex before returning. Requires a small foundational change to re-emit reloadComplete events externally on DevEnv.

  2. Unhandled AbortError from remote tail WebSocket — The remote-runtime tail-logs WebSocket (#activeTail in RemoteRuntimeController) was constructed with the same AbortSignal that onBundleStart aborts to cancel in-flight preview-session operations. The abort destroyed the WebSocket's underlying upgrade request with AbortError, which had no error listener attached and propagated as an unhandled exception — visible in e2e runs as a trailing "Unhandled Errors / Uncaught Exception" in the test report. We now attach an error listener that ignores errors (debug-logged), matching the safeguards already present on the terminate paths.

The complementary test-side changes (per-test updateBindings, transient 5xx retries) live on pbd/agent-memory and will land separately.


  • Tests
    • Tests included/updated
    • Automated tests not possible - manual testing has been completed as follows:
    • Additional testing not necessary because: these are race-condition fixes validated by the existing remote-binding e2e suite. The updateBindings change is exercised by the pending test refactor on pbd/agent-memory; the AbortError suppression eliminates an unhandled-exception report seen in e2e runs.
  • Public documentation
    • Cloudflare docs PR(s):
    • Documentation not necessary because: internal bug fixes to wrangler dev's remote-bindings runtime with no public API changes.

DevEnv extends EventEmitter and routes events between controllers via
its internal `dispatch()` method. Re-emit `reloadComplete` events on the
EventEmitter so external callers (in the next commit,
`RemoteProxySession.updateBindings`) can wait for a remote upload to
finish. Other event types stay internal for now.
… to complete

Previously `updateBindings` resolved as soon as `worker.patchConfig`
dispatched the configUpdate event, long before the remote worker had been
re-uploaded with the new bindings and the local proxy worker had unpaused.
Callers issuing requests immediately after `updateBindings` could see
flaky failures (typically "WebSocket connection failed" for JSRPC bindings
like service bindings or dispatch namespaces) because the local proxy
worker was still in its paused state during the reload window.

`updateBindings` now subscribes to the `reloadComplete` event before
calling `patchConfig`, then awaits either that event (success) or the
DevEnv `error` event (non-recoverable failure). After reload it also
drains the local proxy worker's runtime-message mutex so the "play"
message that resumes the proxy has been delivered, mirroring what
`worker.fetch` already does.
…estart

The remote-runtime tail-logs WebSocket (`#activeTail` in
`RemoteRuntimeController`) is constructed with `signal:
this.#abortController.signal`. The `ws` package wires that signal to
the underlying HTTP upgrade request via Node's `addAbortSignal()`
helper, so when `onBundleStart` aborts the controller to cancel
in-flight preview-session operations, the WebSocket's request is
destroyed with `AbortError` and emits an `error` event.

Only a `message` listener was attached at construction, so the
`error` had no handler and propagated as an unhandled exception —
visible in e2e runs as a trailing "Unhandled Errors / Uncaught
Exception" in the test report.

Attach an `error` listener at WebSocket construction that ignores
errors (debug-logged), matching the safeguards already present on the
`terminate` paths in `#previewToken` and `teardown()`. This covers
the window between WebSocket construction and the next `terminate`,
plus any transient network errors during normal operation.
@petebacondarwin petebacondarwin added the ci-flake Applied to PRs addressing CI flakiness label May 8, 2026
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 8, 2026

🦋 Changeset detected

Latest commit: f1586ac

The changes in this PR will be included in the next version bump.

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@github-project-automation github-project-automation Bot moved this to Untriaged in workers-sdk May 8, 2026
@workers-devprod workers-devprod requested review from a team and edmundhung and removed request for a team May 8, 2026 13:08
@workers-devprod
Copy link
Copy Markdown
Contributor

workers-devprod commented May 8, 2026

Codeowners approval required for this PR:

  • ✅ @cloudflare/wrangler
Show detailed file reviewers

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

✅ All changesets look good

@ask-bonk
Copy link
Copy Markdown
Contributor

ask-bonk Bot commented May 8, 2026

UnknownError: ProviderInitError

github run

@ask-bonk
Copy link
Copy Markdown
Contributor

ask-bonk Bot commented May 8, 2026

@petebacondarwin Bonk workflow failed. Check the logs for details.

View workflow run · To retry, trigger Bonk again.

@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented May 8, 2026

create-cloudflare

npm i https://pkg.pr.new/create-cloudflare@13867

@cloudflare/kv-asset-handler

npm i https://pkg.pr.new/@cloudflare/kv-asset-handler@13867

miniflare

npm i https://pkg.pr.new/miniflare@13867

@cloudflare/pages-shared

npm i https://pkg.pr.new/@cloudflare/pages-shared@13867

@cloudflare/unenv-preset

npm i https://pkg.pr.new/@cloudflare/unenv-preset@13867

@cloudflare/vite-plugin

npm i https://pkg.pr.new/@cloudflare/vite-plugin@13867

@cloudflare/vitest-pool-workers

npm i https://pkg.pr.new/@cloudflare/vitest-pool-workers@13867

@cloudflare/workers-editor-shared

npm i https://pkg.pr.new/@cloudflare/workers-editor-shared@13867

@cloudflare/workers-utils

npm i https://pkg.pr.new/@cloudflare/workers-utils@13867

wrangler

npm i https://pkg.pr.new/wrangler@13867

commit: f1586ac

Comment thread packages/wrangler/src/api/remoteBindings/start-remote-proxy-session.ts Outdated
Copy link
Copy Markdown
Contributor

@workers-devprod workers-devprod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codeowners reviews satisfied

@github-project-automation github-project-automation Bot moved this from Untriaged to Approved in workers-sdk May 8, 2026
…ng events.once()

Address PR feedback to use `node:events`'s `events.once()` helper, which
resolves on the named event and rejects on `error`, instead of hand-rolling
a Promise with cross-cleanup listeners. Same behavior, fewer lines, matches
the existing convention used elsewhere in wrangler (`dev.ts`, `pages/dev.ts`,
`api/dev.ts`, `ProxyController.ts`, `check/commands.ts`).
@petebacondarwin petebacondarwin enabled auto-merge (squash) May 8, 2026 15:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-flake Applied to PRs addressing CI flakiness

Projects

Status: Approved

Development

Successfully merging this pull request may close these issues.

[ci-flake] miniflare-remote-resources.test.ts shard 3 flakes on transient API 5xx

3 participants