refactor(broker): replace stringly-typed protocol IDs with newtypes#984
Conversation
Introduces nine `#[serde(transparent)]` newtype wrappers in
`crates/broker/src/ids.rs` for the broker's previously string-typed
protocol identifiers — `WorkerName`, `WorkspaceId`, `WorkspaceAlias`,
`DeliveryId`, `EventId`, `ThreadId`, `AgentId`, `RequestId`,
`ChannelName` — plus a `MessageTarget` wrapper whose `kind()` returns a
`MessageTargetKind` enum (`Channel` / `Thread` / `DirectMessage` /
`Conversation` / `Worker`) so the formerly overloaded `target: String`
field is now discriminated structurally instead of by hand-rolled
prefix checks.
Each newtype impls `Deref<Target = str>`, `Display`, `AsRef<str>`,
`Borrow<str>`, `From<String>` / `From<&str>`, `Hash`, `Eq`, and
PartialEq against `str` / `&str` / `String`, so call sites that
treated these fields as strings keep compiling. Wire format is
unchanged — `#[serde(transparent)]` means the JSON shape on disk and
over the SDK ↔ broker channel is byte-for-byte identical to the
previous bare-`String` form.
`protocol.rs`, `types.rs`, `runtime/event_loop.rs`, `listen_api.rs`
and the runtime helpers thread the new types through `RelayDelivery`,
`InboundRelayEvent`, every `BrokerEvent` variant, `PendingDelivery`,
`DeliveryAttemptOutcome`, `ListenApiRequest`, and the
`HashMap<DeliveryId, _>` / `HashMap<WorkerName, _>` collection keys.
Routing's `resolve_delivery_targets` switches from
`event.target.starts_with('#')` / `event.target == "thread"` to a
`MessageTargetKind` match.
|
Warning Review limit reached
Your plan includes 4 reviews of capacity. Refill in 6 minutes and 13 seconds. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more review capacity refills, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than trial, open-source, and free plans. In all cases, review capacity refills continuously over time. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (5)
📝 WalkthroughWalkthroughThis PR migrates the agent-relay-broker codebase from stringly-typed protocol identifiers to strongly-typed ChangesTyped Identifier Refactoring
🎯 4 (Complex) | ⏱️ ~60 minutes Possibly Related PRs
Suggested Reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 29f93da8a4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| ); | ||
| if pending_for_failure.is_some() && !delivery_id.is_empty() { | ||
| terminal_failed_deliveries.insert(delivery_id.to_string()); | ||
| terminal_failed_deliveries.insert(EventId::from(&event_id[..])); |
There was a problem hiding this comment.
Track terminal failures by delivery ID
terminal_failed_deliveries is still checked with delivery_id in the delivery_ack guard, but this change now inserts event_id into that set. In the late-ack scenario (delivery_failed arrives first, then a delayed delivery_ack), the guard can no longer match and the ack is processed instead of being ignored, which regresses the intended terminal-state protection for delivery lifecycle events. Store the same identifier that the guard checks (or update the guard to check event IDs consistently).
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in cc42f0e. You're right — the original set was keyed by delivery_id and I incorrectly switched the element type to EventId while leaving the guard at line 29 untouched. Reverted the set to HashSet<DeliveryId> and restored delivery_id-based insert/remove so the late-ack guard suppresses correctly again.
Generated by Claude Code
There was a problem hiding this comment.
1 issue found across 31 files
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/broker/src/runtime/worker_events.rs (1)
21-30:⚠️ Potential issue | 🟠 Major | ⚡ Quick winTerminal-failure guard still uses
delivery_idwhile the set is keyed byevent_id.Line 29 checks
terminal_failed_deliveries.contains(delivery_id), but Line 214 storesEventIdfromevent_id. This makes late-ack suppression miss terminal failures, and Line 213’s!delivery_id.is_empty()gate can still insert an emptyEventId.🔧 Suggested fix
- let delivery_id = payload + let delivery_id = payload .get("delivery_id") .and_then(Value::as_str) .unwrap_or(""); + let event_id = payload + .get("event_id") + .and_then(Value::as_str) + .unwrap_or(""); // Terminal guard: ignore late delivery_ack events once a // delivery has reached terminal failed status. - if !delivery_id.is_empty() - && terminal_failed_deliveries.contains(delivery_id) + if !event_id.is_empty() + && terminal_failed_deliveries.contains(event_id) { tracing::info!( worker = %name, delivery_id = %delivery_id, + event_id = %event_id, "ignoring late delivery_ack after terminal failed status" ); return; } ... - if pending_for_failure.is_some() && !delivery_id.is_empty() { + if pending_for_failure.is_some() && !event_id.is_empty() { terminal_failed_deliveries.insert(EventId::from(&event_id[..])); }Also applies to: 213-215
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/broker/src/runtime/worker_events.rs` around lines 21 - 30, The terminal-failure guard is checking terminal_failed_deliveries.contains(delivery_id) while the set is keyed by EventId derived from the "event_id" payload, causing misses and allowing an empty delivery_id to be used; change the code to read and validate the "event_id" (e.g., parse into EventId or a non-empty event_id_str) and use that event_id when checking terminal_failed_deliveries.contains(...), and ensure you only insert/check non-empty/valid EventId values (avoid using the empty-string gate with delivery_id and instead gate on a valid event_id/EventId).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@CHANGELOG.md`:
- Line 17: Update the CHANGELOG entry to call out this Rust API break under a
"Breaking Changes" and "Migration Guidance" heading: state up front that public
structs in agent-relay-broker now require typed ID newtypes (e.g., DeliveryId,
EventId and others defined in protocol.rs and types.rs) instead of String, and
note crates/broker/src/lib.rs now exposes pub mod ids; explain that the wire
format is unchanged but downstream Rust consumers must update code to use the
new newtypes and adjust any usages of MessageTarget::kind() (which now
discriminates Channel/Thread/DirectMessage/Conversation/Worker) rather than
string-prefix checks, and give concise migration steps to replace String IDs
with the corresponding newtypes.
In `@crates/broker/src/ids.rs`:
- Around line 175-177: The doc comment for the string_id! type ChannelName
incorrectly states the value includes a leading `#`, but the protocol uses raw
channel names (e.g., "ops", "alerts") without `#`; update the ChannelName docs
(the string_id! declaration for ChannelName) to state that it represents the raw
channel identifier without a leading `#` and add an example showing the raw form
used on the wire (or mention the wire uses names without `#`) so call sites are
not misled.
---
Outside diff comments:
In `@crates/broker/src/runtime/worker_events.rs`:
- Around line 21-30: The terminal-failure guard is checking
terminal_failed_deliveries.contains(delivery_id) while the set is keyed by
EventId derived from the "event_id" payload, causing misses and allowing an
empty delivery_id to be used; change the code to read and validate the
"event_id" (e.g., parse into EventId or a non-empty event_id_str) and use that
event_id when checking terminal_failed_deliveries.contains(...), and ensure you
only insert/check non-empty/valid EventId values (avoid using the empty-string
gate with delivery_id and instead gate on a valid event_id/EventId).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 3147f977-9a13-46a4-a556-e31e7811bf0e
📒 Files selected for processing (31)
CHANGELOG.mdcrates/broker/src/broker.rscrates/broker/src/broker/delivery_verification.rscrates/broker/src/conversation_log.rscrates/broker/src/ids.rscrates/broker/src/lib.rscrates/broker/src/listen_api.rscrates/broker/src/protocol.rscrates/broker/src/pty_worker.rscrates/broker/src/relaycast/bridge.rscrates/broker/src/relaycast/workspace.rscrates/broker/src/relaycast/ws.rscrates/broker/src/routing.rscrates/broker/src/runtime/api.rscrates/broker/src/runtime/delivery.rscrates/broker/src/runtime/event_loop.rscrates/broker/src/runtime/init.rscrates/broker/src/runtime/io.rscrates/broker/src/runtime/maintenance.rscrates/broker/src/runtime/mod.rscrates/broker/src/runtime/relaycast_events.rscrates/broker/src/runtime/session.rscrates/broker/src/runtime/spawn_spec.rscrates/broker/src/runtime/tests.rscrates/broker/src/runtime/util.rscrates/broker/src/runtime/worker_events.rscrates/broker/src/scheduler.rscrates/broker/src/supervisor.rscrates/broker/src/types.rscrates/broker/src/worker.rscrates/broker/src/wrap.rs
The newtype refactor changed the set's element type from `HashSet<String>` (storing delivery IDs) to `HashSet<EventId>` and shifted both insert/remove sites to use event IDs. The late-ack guard at the top of `delivery_ack` handling still reads `delivery_id` off the payload and queries the set with it, so once the set was keyed by event IDs the guard's `.contains()` could no longer match and a late-arriving `delivery_ack` after a terminal `delivery_failed` would be processed instead of suppressed. Revert the set's element type to `HashSet<DeliveryId>` and reinstate `delivery_id`-based insert/remove. The guard, the insert in `delivery_failed`, and the remove in `delivery_ack` are now all keyed consistently on the delivery identifier as before this refactor. Also tighten the `ChannelName` doc comment — channels appear on the wire as raw names like `"general"` without `#`; the `#` prefix is the `MessageTarget` convention for routing *to* a channel, not the channel's own identifier. Restructure the CHANGELOG entry as `Breaking Changes` plus `Migration Guidance` (per Keep a Changelog + project changelog rule) since downstream Rust callers must construct the new typed IDs even though the JSON wire format is unchanged.
Summary
The
agent-relay-brokercrate was carrying ~30 protocol fields typed as plainStringwhose meaning is well-defined and easy to confuse with one another(
delivery_idvsevent_idvsagent_id,namevstarget, etc.). Theworst offender was
target: String, which carried four distinct meanings —channel (
#general), DM conversation (dm_*), group conversation(
conv_*), worker display name, or the thread-routing sentinel"thread"— discriminated by hand-rolled prefix checks at every routing call site.
This PR introduces typed wrappers and threads them through. The fix follows
a focused critique that the crate's identifiers were being shuffled around
as raw strings — exactly the kind of stringly-typed sprawl that LLM-written
Rust falls into.
What changed
New module
crates/broker/src/ids.rsdefines:WorkerName,WorkspaceId,WorkspaceAlias,DeliveryId,EventId,ThreadId,AgentId,RequestId,ChannelName—#[serde(transparent)]newtypes wrapping
String.MessageTargetnewtype with akind()method returning a discriminatedMessageTargetKind { Channel(&str), Thread, DirectMessage(&str), Conversation(&str), Worker(&str) }. The hand-rolledevent.target.starts_with('#')/event.target == "thread"checks inrouting.rsbecome amatchonkind().Each wrapper impls
Deref<Target = str>,Display,AsRef<str>,Borrow<str>,Hash,Eq,From<String>/From<&str>, andPartialEq<str>/<&str>/<String>(both directions). The point isthat callers that treated these as strings keep compiling — no ceremony at
use sites — but the type system now catches passing a
DeliveryIdwherean
EventIdwas expected.The newtypes are applied in:
protocol.rs— every variant ofBrokerEvent,BrokerToWorker,WorkerToBroker,SdkToBroker, plusRelayDelivery,AgentSpec, andProtocolEnvelope.types.rs—InboundRelayEvent,PendingRelayMessage,BrokerCommandEvent,InjectRequest,SpawnParams,ReleaseParams.listen_api.rs— every variant ofListenApiRequestthat carries anidentifier (Spawn, Release, Send, ResizePty, SubscribeChannels, etc.).
runtime/event_loop.rs—BrokerRuntime's collection types now key onWorkerName/DeliveryIdinstead of bareString(
HashMap<WorkerName, InboundDeliveryState>,HashMap<DeliveryId, PendingDelivery>,HashMap<WorkspaceId, RelayWorkspace>, etc.).runtime/delivery.rs—PendingDelivery,PersistedPendingDelivery,DeliveryAttemptOutcome, and helper signatures.runtime/session.rs,relaycast/workspace.rs—RelayWorkspace,WorkspaceSessionHandle,MultiWorkspaceSession,WorkspaceMembershipSummary.worker.rs,wrap.rs,pty_worker.rs,broker.rs,runtime/maintenance.rs,runtime/relaycast_events.rs,etc.
Why the wire format is preserved
Every newtype is
#[serde(transparent)]. JSON on disk and on the SDK ↔broker channel is byte-for-byte identical to before — both ways, both
directions. The
protocol::testsround-trip suite still passesunchanged.
What is not in this PR
Two follow-ups were scoped but deferred to keep this PR reviewable:
runtime::api::handle_api_requestfunction still pre-binds ~15&mut self.fieldreferences at its top and is 1500+ lines across 25enum arms. Extracting each arm into a focused
BrokerRuntime::handle_*method is a sizable behavioral refactor of its own.
#[allow(dead_code)]annotations on ~13 modules inlib.rswerehiding genuine dead code (
scheduler::Scheduler,redact::redact,conversation_log::ConversationLog, and others are entirely unused).This PR preserves the allows but adds a comment in
lib.rscallingout that they mask real dead code rather than a library-vs-binary
visibility artifact, so the follow-up has a starting point.
Test plan
cargo test -p agent-relay-broker --lib— 656 passed / 0 failed /4 ignored.
cargo test -p agent-relay-broker --tests— integration testspass.
cargo build -p agent-relay-broker --bin agent-relay-broker— binarybuilds clean.
cargo check --all-targets— 0 warnings, 0 errors.through a channel target, send a DM, send a thread reply, observe
that the broker continues to route correctly with the
MessageTargetKinddispatch in place. (Not exercised locally —worth a real-runtime check before merging.)
https://claude.ai/code/session_01QgCfgzy51fdAa6XLRfg3L2
Generated by Claude Code