improve sync durability and add storage bridge specs#118
improve sync durability and add storage bridge specs#118khaliqgant wants to merge 9 commits intomainfrom
Conversation
…ending - Lower default ping interval from 30s to 15s, halving the worst-case silent-failure detection window (~90s → ~45s for NAT/LB idle drops) - Add RelayFileSyncHealthStatus and getHealthStatus() so long-running agents can inspect connection state, degradation reason, last frame time, and reconnect attempts without parsing log output - Add onDegraded/onRecovered callbacks to RelayFileSyncOptions; fired when the sync involuntarily falls back to polling and when it recovers to a live WebSocket — gives agent runtimes a structured hook to alert operators rather than relying solely on console.warn - Track stateEnteredAt so callers can measure how long a session has been in degraded state - Add flushPending() to AutoSyncHandle: cancels pending debounce timers and runs a full reconcile, ensuring recent writes are synced before a checkpoint or process exit - Add watchersHealthy() to AutoSyncHandle: exposes whether both filesystem watchers are subscribed and active so callers know whether change detection is watcher-driven or reconcile-only https://claude.ai/code/session_01Lbmw1Cj23tw8LovrfFpv24
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughExposes local watcher health and adds flushPending to auto-sync; instruments WebSocket sync with health status, degraded/recovery callbacks and timestamps, reduces ping interval to 15s; re-exports the health type; adds Storage Bridge spec, priority, integration-priority, and pricing docs. ChangesSync Health Monitoring
Storage Bridge & Docs
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/local-mount/src/auto-sync.ts (1)
170-180:⚠️ Potential issue | 🟠 Major | ⚡ Quick winTrack per-subscription watcher health and update on runtime errors.
_watchersHealthyis only set during initial subscribe success andstop(). When the watcher callback receives an error at runtime, it callsonError(err)but does not update health state, sowatchersHealthy()can return stale-trueafter a subscription degrades. Per@parcel/watchersemantics, callbackerrindicates the subscription is no longer reliable and should be treated as degraded.Introduce per-subscription health flags (
_mountWatcherHealthy,_projectWatcherHealthy) and update them in the callback whenerroccurs, recomputing the overall health as both-healthy. Pass a callback tosubscribeTo()to mark the subscription unhealthy on error:Suggested approach
let mountSub: AsyncSubscription | undefined; let projectSub: AsyncSubscription | undefined; let _watchersHealthy = false; +let _mountWatcherHealthy = false; +let _projectWatcherHealthy = false; +const recomputeWatchersHealthy = (): void => { + _watchersHealthy = _mountWatcherHealthy && _projectWatcherHealthy; +}; -const subscribeTo = (root: string): Promise<AsyncSubscription> => +const subscribeTo = ( + root: string, + onRuntimeError: () => void +): Promise<AsyncSubscription> => watcher.subscribe( root, (err, events) => { - if (err) { onError(err); return; } + if (err) { + onRuntimeError(); + recomputeWatchersHealthy(); + onError(err); + return; + } for (const ev of events) { schedulePathSync(root, ev.path); } }, { ignore: buildIgnoreGlobs(ctx, root) } ); const watchersReady = (async () => { const [mountResult, projectResult] = await Promise.allSettled([ - subscribeTo(ctx.realMountDir), - subscribeTo(ctx.realProjectDir), + subscribeTo(ctx.realMountDir, () => { _mountWatcherHealthy = false; }), + subscribeTo(ctx.realProjectDir, () => { _projectWatcherHealthy = false; }), ]); if (mountResult.status === 'fulfilled') mountSub = mountResult.value; if (projectResult.status === 'fulfilled') projectSub = projectResult.value; if (mountResult.status === 'fulfilled' && projectResult.status === 'fulfilled') { - _watchersHealthy = true; + _mountWatcherHealthy = true; + _projectWatcherHealthy = true; + recomputeWatchersHealthy(); return;Also applies to: lines 184–197, 261
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/local-mount/src/auto-sync.ts` around lines 170 - 180, The subscription callback treats runtime errors via onError(err) but never updates the watcher-health state, so watchersHealthy() can stay true after a subscription degrades; update health per-subscription: add boolean flags _mountWatcherHealthy and _projectWatcherHealthy, set them to true on successful subscribe in subscribeTo() and set the appropriate flag to false inside the watcher.subscribe callback when err is non-null (then recompute _watchersHealthy as _mountWatcherHealthy && _projectWatcherHealthy), and ensure stop() still clears flags; modify subscribeTo(root, markUnhealthyCallback) or accept a callback parameter so the callback invoked on err flips the correct per-subscription flag, then keep calling onError(err) and schedulePathSync(root, ev.path) unchanged for normal events.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/sdk/typescript/src/sync.ts`:
- Around line 548-557: The recovery callback runs before the state is set to
"open", so move the state transition to occur before invoking onRecovered:
ensure this.degraded and this.degradedReason are cleared, call
this.setState("open") next, then invoke this.onRecovered() inside the existing
try/catch (preserving debugLog on error) so any handler that calls
getHealthStatus() will observe the "open" state.
- Around line 389-397: Constructor sets this.state = "closed" when
options.signal is already aborted but doesn't update stateEnteredAt because it
skips calling setState; update the constructor to either call setState("closed")
instead of assigning this.state directly or explicitly set this.stateEnteredAt =
Date.now() (or use the same timestamp logic as setState) when handling
options.signal.aborted so getHealthStatus() returns an accurate stateEnteredAt;
references: constructor, options.signal, this.state, setState(), stateEnteredAt,
getHealthStatus().
---
Outside diff comments:
In `@packages/local-mount/src/auto-sync.ts`:
- Around line 170-180: The subscription callback treats runtime errors via
onError(err) but never updates the watcher-health state, so watchersHealthy()
can stay true after a subscription degrades; update health per-subscription: add
boolean flags _mountWatcherHealthy and _projectWatcherHealthy, set them to true
on successful subscribe in subscribeTo() and set the appropriate flag to false
inside the watcher.subscribe callback when err is non-null (then recompute
_watchersHealthy as _mountWatcherHealthy && _projectWatcherHealthy), and ensure
stop() still clears flags; modify subscribeTo(root, markUnhealthyCallback) or
accept a callback parameter so the callback invoked on err flips the correct
per-subscription flag, then keep calling onError(err) and schedulePathSync(root,
ev.path) unchanged for normal events.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 18b59ddf-34f0-4f40-b2ae-1b308acd86df
⛔ Files ignored due to path filters (1)
package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (3)
packages/local-mount/src/auto-sync.tspackages/sdk/typescript/src/index.tspackages/sdk/typescript/src/sync.ts
| if (this.degraded) { | ||
| this.degraded = false; | ||
| this.degradedReason = undefined; | ||
| try { | ||
| this.onRecovered?.(); | ||
| } catch (error) { | ||
| debugLog("onRecovered handler threw", error); | ||
| } | ||
| } |
There was a problem hiding this comment.
🟡 onRecovered fires before setState("open"), exposing inconsistent health status
The onRecovered callback at line 552 fires before this.setState("open") at line 557. At that point this.degraded has already been cleared to false (line 549) but this.state is still "reconnecting" (or "connecting"). If the onRecovered handler calls getHealthStatus() (packages/sdk/typescript/src/sync.ts:389-398) or getState(), it observes a contradictory snapshot: { state: "reconnecting", degraded: false, stateEnteredAt: <old-timestamp>, ... }. The contract documented in the JSDoc says recovery means "back to a live WebSocket" — the caller would reasonably expect state: "open". The recovery block (lines 548-556) should be moved to after this.setState("open") so the state is consistent when the callback fires.
Prompt for agents
In the WebSocket `open` event handler inside `openWebSocketWithToken` (packages/sdk/typescript/src/sync.ts, around line 537-561), the recovery block (if this.degraded ... onRecovered) fires before this.setState("open"). This means any onRecovered callback that calls getHealthStatus() or getState() will see state: "reconnecting" with degraded: false, which is contradictory.
To fix: move the entire `if (this.degraded) { ... }` block (currently lines 548-556) to after the `this.setState("open")` call at line 557. This ensures the state machine is in the "open" state before the recovery callback fires, giving callers a consistent view. Consider placing it after `this.emit("open", event)` at line 560 so that all internal state transitions and event emissions complete before the user callback.
Was this helpful? React with 👍 or 👎 to provide feedback.
Adds two documents covering the storage bridge architecture: storage-bridge-spec.md — comprehensive technical spec for bridging infrastructure storage systems (S3, Postgres, Redis, GCS, Azure Blob) into relayfile via a common StorageBridgeEvent envelope, pub/sub layer, and storage adapter worker. Covers per-system bridge implementations, Nango scheduled sync fallback, writeback, error handling, security, observability, and a phased implementation plan. storage-bridge-priority.md — prioritizes integrations by native push notification capability. Tier 1 (Google Drive, SharePoint/OneDrive, Dropbox, Gmail, GCS, Azure Blob) delivers true real-time awareness and is our strongest differentiator vs MCP and Mirage. Includes a 4-sprint implementation roadmap ordered by addressable market and shared infrastructure. https://claude.ai/code/session_01Lbmw1Cj23tw8LovrfFpv24
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/storage-bridge-spec.md (1)
1-1260:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winCritical: PR objectives mismatch with file contents.
The PR description states this PR improves sync durability with health status API, faster heartbeat intervals, flush pending functionality, and WebSocket sync improvements. However, the files included are comprehensive storage bridge specification documents that appear completely unrelated to the described changes.
Either:
- The wrong files were included in this PR, or
- The PR description is incorrect and should describe the storage bridge specification work
Please clarify which is correct and update accordingly.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/storage-bridge-spec.md` around lines 1 - 1260, The PR description claims changes to sync durability (health status API, heartbeat intervals, flush pending, WebSocket sync) but the patch only contains the storage bridge spec (e.g., types and functions like StorageBridgeEvent, mapS3Event, runS3Bridge, runPostgresListenBridge, runRedisBridge, fetchContent, ingestEvent), so fix the mismatch by either: (A) replacing the included files with the actual implementation changes for sync durability (ensure commits include the health API, heartbeat code, flush pending logic, WebSocket sync changes and update PR title/body to match), or (B) updating the PR description, title, and changelog to accurately describe that this PR adds the storage bridge spec document and list the new spec artifacts (StorageBridgeEvent, per-system mappings, adapter worker behavior, Nango integration, etc.); also verify the commit list contains only intended files and update the reviewer checklist accordingly.
🧹 Nitpick comments (7)
docs/storage-bridge-priority.md (3)
30-47: 💤 Low valueAdd language specifiers to flow diagrams.
The flow diagrams would benefit from a language hint like
textormermaidif they can be converted to proper Mermaid diagrams for better rendering. As per static analysis hints.Also applies to: 69-80, 101-112, 135-149
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/storage-bridge-priority.md` around lines 30 - 47, The flow diagram code fences lack language specifiers; update each fenced block (the shown diagram and the other similar blocks at the other locations) to include an appropriate language hint such as text or mermaid so the renderer can apply correct formatting (e.g., replace ``` with ```text or ```mermaid for the diagrams in the docs/local flow blocks); ensure you update all instances referenced in the review (the block in the comment and the blocks around the other ranges) so every diagram fence has a language specifier.
290-297: 💤 Low valueUser count estimates may need citation or verification.
The table claims "~3B (Google Workspace)" and "+1.5B (Microsoft 365)" addressable users. These are rough market size estimates that should either be cited or marked as approximate/illustrative to avoid appearing as precise metrics.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/storage-bridge-priority.md` around lines 290 - 297, The "Total addressable users" row contains bold numeric claims ("~3B (Google Workspace)", "+1.5B (Microsoft 365)", "+700M (Dropbox + Gmail personal)") that need validation or qualification—either add citations for each estimate or change the wording to clearly mark them as illustrative/approximate (e.g., "approx." or "illustrative estimate") in the table; update the table cells for "Total addressable users" accordingly and, if adding citations, reference authoritative sources immediately after each figure (or add a footnote) so the values in the "Total addressable users" row are verifiable.
235-248: ⚡ Quick winSprint 1 scope may be overcommitted for 4 weeks.
Sprint 1 includes:
- Google Drive Watch API subscription manager (create, renew, delete)
- Drive Changes API delta fetcher
- GCS Pub/Sub notification bridge
- Storage adapter worker with pub/sub backend
- File path conventions for both systems
- Writeback for Drive (create, update, delete)
- Nango scheduled sync fallback for both
This represents 7 major deliverables, including a complete adapter worker implementation. For a 4-week sprint, this may be optimistic unless the team is large or has prior experience with similar systems. Consider whether this should be split or if the estimate should be extended.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/storage-bridge-priority.md` around lines 235 - 248, Sprint 1 is overcommitted with seven major deliverables; split into a smaller MVP and follow-up work by moving lower-priority or higher-effort items to a subsequent sprint. Keep the core MVP in Sprint 1: Google Drive Watch API subscription manager, Drive Changes API delta fetcher, and GCS Pub/Sub notification bridge; defer the full Storage adapter worker implementation, Drive writeback (create/update/delete), and Nango scheduled sync fallback to Sprint 2 (or split the worker into a lightweight pub/sub-only adapter in Sprint 1 and full writeback in Sprint 2). Update the plan to reflect the new split and adjust timelines/estimates accordingly.docs/storage-bridge-spec.md (4)
1247-1247: 🏗️ Heavy liftContent size threshold decision impacts costs significantly.
Open question
#1raises a critical tradeoff: including content in pub/sub messages under 256 KB saves latency but increases message costs. For high-volume sources (e.g., S3 buckets with millions of small files), this could mean 10-100x higher pub/sub costs. Recommend establishing a cost model before deciding, and making this threshold configurable per source.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/storage-bridge-spec.md` at line 1247, The decision about whether the pub/sub envelope includes object content under a size threshold (e.g., 256 KB) affects cost and latency; update the spec to require (1) a documented cost model to quantify message cost vs latency for high-volume sources and (2) a configurable per-source "pubsub_content_threshold" parameter (defaulting to a safe value like 0 or 256KB) that the adapter honors when deciding to embed content in the envelope versus fetching it on delivery; reference the terms "pub/sub envelope", "adapter", and "pubsub_content_threshold" in the spec so implementers can apply the threshold per source and opt into the behavior.
294-294: ⚡ Quick winSQL trigger assumes 'id' column exists.
The comment
-- assumes id column; configurableacknowledges this assumption, but the actual SQL hardcodesrow_to_json(NEW).id. For a production-ready spec, consider showing the configurable version or noting this is a simplified example.Suggested clarification
- 'pk', row_to_json(NEW).id -- assumes id column; configurable + 'pk', (row_to_json(NEW) ->> '${pk_column_name}')::text -- configurable via trigger parameterOr add a note that this is a simplified example and production triggers would be generated per-table with the actual PK column name.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/storage-bridge-spec.md` at line 294, The SQL trigger currently hardcodes row_to_json(NEW).id for the 'pk' value, which assumes every table uses an id column; update the spec to either show a configurable version that uses a placeholder or trigger argument (e.g., NEW.<primary_key_column> or TG_ARGV[0]) and describe that production triggers should be generated per-table with the actual PK column name, or explicitly state this is a simplified example; reference the expression row_to_json(NEW).id and the 'pk' mapping so reviewers know exactly where to replace or parameterize the PK access.
29-67: 💤 Low valueAdd language specifiers to fenced code blocks.
Multiple fenced code blocks throughout the document lack language specifiers. While the ASCII diagrams (architecture, flow diagrams) should remain unlabeled, code blocks showing configuration, topic names, or structured examples should specify the format for better rendering and syntax highlighting.
As per static analysis hints, add language specifiers where appropriate (e.g.,
yaml,text,bash).Examples of blocks that could benefit from language hints
Line 191-197 (S3 setup structure): Could use
textoryaml
Line 415-421 (Redis config): Should beiniorconf
Line 888-891 (Topic naming): Could usetext
Line 1047-1051 (Dead-letter topic): Could usetextAlso applies to: 191-256, 415-502, 657-671, 888-909, 1047-1051
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/storage-bridge-spec.md` around lines 29 - 67, Several fenced code blocks lack language specifiers; leave ASCII diagrams (e.g., the "Storage Systems" / "Per-System Bridge Processes" diagram) unlabeled but add appropriate language tags to configuration and example blocks (e.g., the S3 setup section around the "S3 setup structure" block, Redis config blocks near "Redis config", topic naming examples containing "relayfile.storage.events.{workspace_id}", and the dead-letter topic example). Edit the fenced blocks that show configuration, topic names, or CLI examples to include language hints like yaml, text, bash, or ini (e.g., add ```yaml, ```text, ```bash, or ```ini after the opening ```), ensuring blocks that are pure ASCII diagrams remain without a language tag; use the strings "relayfile.storage.events.{workspace_id}", "POST /v1/workspaces/{id}/webhooks/ingest", and the S3/Redis section headings to locate the exact blocks to update.
717-738: 💤 Low valueFunction signature shows inconsistent parameter naming.
The function
mapNangoSyncRecordexpectsrecord.key ?? record.id(line 721) but the example webhook payload only showskeyas a field. If different Nango sync models use different identifier fields, this should be documented or the fallback logic clarified.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/storage-bridge-spec.md` around lines 717 - 738, The mapNangoSyncRecord function is using record.key ?? record.id for the eventId but the webhook example only guarantees key, causing inconsistent parameter expectations; update mapNangoSyncRecord to normalize the record identifier early (e.g., compute a single recordId = record.key || record.id || record.record_id) and then use that normalized recordId everywhere (eventId formation, computeResourceId call, and any other places that reference id/key), and add a brief docstring above mapNangoSyncRecord describing which fields are accepted as canonical ids; reference the existing symbols mapNangoSyncRecord, eventId construction, computeResourceId, computeRelayfilePath and mapNangoProviderToSource to locate where to apply the normalization and doc update.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/storage-bridge-priority.md`:
- Around line 242-243: The roadmap entry mentions "GCS Pub/Sub notification
bridge (from `storage-bridge-spec.md` Phase 1)" but that cross-reference is
incorrect; locate the phrase "GCS Pub/Sub notification bridge" in
docs/storage-bridge-priority.md and update the reference to point to the correct
phase in storage-bridge-spec.md (replace "Phase 1" with "Phase 2" or remove the
phase pointer), or adjust the roadmap item to match the actual intent (e.g.,
change to S3/Nango webhook if Phase 1 is intended); ensure the text and any
parenthetical reference to `storage-bridge-spec.md` consistently reflect the
same phase.
- Around line 224-229: The table currently lists "AWS S3" as Tier 3 with Nango
as 1–5 min fallback, which contradicts the spec that treats S3+SQS as the
real-time Tier 1 path; update the "AWS S3" row so that S3 with SQS Event
Notifications is shown as Tier 1 (primary real-time) and only show "Nango poll"
as the Tier 3 fallback option, and replace the existing footnote text with a
concise note that "S3 with SQS Event Notifications is Tier 1; polling via Nango
or other pollers is a Tier 3 fallback" so the table and note (references: "AWS
S3", "SQS events", "Nango poll") consistently reflect the spec.
In `@docs/storage-bridge-spec.md`:
- Around line 506-509: The doc currently claims "No Nango fallback for Redis —
Nango does not have a Redis integration" which conflicts with the architecture's
statement that Nango provides near-real-time fallbacks; update the
storage-bridge spec to be consistent by either (A) explicitly marking Redis as
"real-time-only (no Nango fallback)" in the latency tier table referenced around
the "latency tier table" / lines 77-83, and clarifying the statement "No Nango
fallback for Redis" in the Redis notes section (the paragraph containing "No
Nango fallback for Redis") or (B) if Redis can use Nango, remove or reword that
sentence and describe how Nango would be used; ensure both the Redis notes (the
bullets including "Keyspace notifications..." and "No Nango fallback for Redis")
and the latency tier table reference the same behavior for Redis so the doc is
consistent.
- Around line 340-343: The client error handler currently sleeps 5s and calls
runPostgresListenBridge(config) unconditionally, creating an infinite tight
retry loop; change the error handling in the client.on("error") callback to use
an exponential backoff strategy with a capped delay and a max-retry limit (e.g.,
double delay each retry up to a ceiling, track retries in a retry counter) and
log each attempt; when max retries are exceeded, stop restarting and surface the
error (or emit a terminal failure event) so persistent failures (like bad
credentials) don’t loop forever. Ensure you update references in the same scope
where client, sleep, and runPostgresListenBridge(config) are used so the retry
counter and backoff delay persist across restarts or are stored externally.
- Around line 982-990: The INSERT/ON CONFLICT SQL is vulnerable because schema
and table are interpolated directly; validate or safely quote those identifiers
before building the query (do not rely on parameter placeholders, which cannot
be used for identifiers). Update the call that constructs the SQL (the block
that references schema, table, pkColumn, tableConfig, and config.client.query)
to either (a) validate schema and table against a strict whitelist/regex (e.g.
allowed identifier pattern) OR (b) use a safe identifier-quoting helper such as
pg-format's format.ident (or an equivalent function) to escape
schema/table/pkColumn, and continue to pass the row payload as a parameterized
value. Ensure pkColumn is also validated/quoted and only non-identifier content
is left interpolated after this sanitization.
- Around line 809-810: The Postgres and Redis bridge implementations don't add
the actual content to metadata, so code reading
event.metadata["postgres.row_json"] and event.metadata["redis.value"] gets
"null"; update the Postgres bridge where fetchRow() is used (the function that
retrieves the row) to add metadata["postgres.row_json"] = JSON.stringify(row ??
null) (or an appropriate stringified representation), and update the Redis
bridge where the value is retrieved to add metadata["redis.value"] = value ==
null ? "null" : value.toString() (or JSON.stringify for structured values);
ensure the metadata keys exactly match "postgres.row_json" and "redis.value" so
the adapter's Buffer.from(...) calls receive the actual content.
---
Outside diff comments:
In `@docs/storage-bridge-spec.md`:
- Around line 1-1260: The PR description claims changes to sync durability
(health status API, heartbeat intervals, flush pending, WebSocket sync) but the
patch only contains the storage bridge spec (e.g., types and functions like
StorageBridgeEvent, mapS3Event, runS3Bridge, runPostgresListenBridge,
runRedisBridge, fetchContent, ingestEvent), so fix the mismatch by either: (A)
replacing the included files with the actual implementation changes for sync
durability (ensure commits include the health API, heartbeat code, flush pending
logic, WebSocket sync changes and update PR title/body to match), or (B)
updating the PR description, title, and changelog to accurately describe that
this PR adds the storage bridge spec document and list the new spec artifacts
(StorageBridgeEvent, per-system mappings, adapter worker behavior, Nango
integration, etc.); also verify the commit list contains only intended files and
update the reviewer checklist accordingly.
---
Nitpick comments:
In `@docs/storage-bridge-priority.md`:
- Around line 30-47: The flow diagram code fences lack language specifiers;
update each fenced block (the shown diagram and the other similar blocks at the
other locations) to include an appropriate language hint such as text or mermaid
so the renderer can apply correct formatting (e.g., replace ``` with ```text or
```mermaid for the diagrams in the docs/local flow blocks); ensure you update
all instances referenced in the review (the block in the comment and the blocks
around the other ranges) so every diagram fence has a language specifier.
- Around line 290-297: The "Total addressable users" row contains bold numeric
claims ("~3B (Google Workspace)", "+1.5B (Microsoft 365)", "+700M (Dropbox +
Gmail personal)") that need validation or qualification—either add citations for
each estimate or change the wording to clearly mark them as
illustrative/approximate (e.g., "approx." or "illustrative estimate") in the
table; update the table cells for "Total addressable users" accordingly and, if
adding citations, reference authoritative sources immediately after each figure
(or add a footnote) so the values in the "Total addressable users" row are
verifiable.
- Around line 235-248: Sprint 1 is overcommitted with seven major deliverables;
split into a smaller MVP and follow-up work by moving lower-priority or
higher-effort items to a subsequent sprint. Keep the core MVP in Sprint 1:
Google Drive Watch API subscription manager, Drive Changes API delta fetcher,
and GCS Pub/Sub notification bridge; defer the full Storage adapter worker
implementation, Drive writeback (create/update/delete), and Nango scheduled sync
fallback to Sprint 2 (or split the worker into a lightweight pub/sub-only
adapter in Sprint 1 and full writeback in Sprint 2). Update the plan to reflect
the new split and adjust timelines/estimates accordingly.
In `@docs/storage-bridge-spec.md`:
- Line 1247: The decision about whether the pub/sub envelope includes object
content under a size threshold (e.g., 256 KB) affects cost and latency; update
the spec to require (1) a documented cost model to quantify message cost vs
latency for high-volume sources and (2) a configurable per-source
"pubsub_content_threshold" parameter (defaulting to a safe value like 0 or
256KB) that the adapter honors when deciding to embed content in the envelope
versus fetching it on delivery; reference the terms "pub/sub envelope",
"adapter", and "pubsub_content_threshold" in the spec so implementers can apply
the threshold per source and opt into the behavior.
- Line 294: The SQL trigger currently hardcodes row_to_json(NEW).id for the 'pk'
value, which assumes every table uses an id column; update the spec to either
show a configurable version that uses a placeholder or trigger argument (e.g.,
NEW.<primary_key_column> or TG_ARGV[0]) and describe that production triggers
should be generated per-table with the actual PK column name, or explicitly
state this is a simplified example; reference the expression row_to_json(NEW).id
and the 'pk' mapping so reviewers know exactly where to replace or parameterize
the PK access.
- Around line 29-67: Several fenced code blocks lack language specifiers; leave
ASCII diagrams (e.g., the "Storage Systems" / "Per-System Bridge Processes"
diagram) unlabeled but add appropriate language tags to configuration and
example blocks (e.g., the S3 setup section around the "S3 setup structure"
block, Redis config blocks near "Redis config", topic naming examples containing
"relayfile.storage.events.{workspace_id}", and the dead-letter topic example).
Edit the fenced blocks that show configuration, topic names, or CLI examples to
include language hints like yaml, text, bash, or ini (e.g., add ```yaml,
```text, ```bash, or ```ini after the opening ```), ensuring blocks that are
pure ASCII diagrams remain without a language tag; use the strings
"relayfile.storage.events.{workspace_id}", "POST
/v1/workspaces/{id}/webhooks/ingest", and the S3/Redis section headings to
locate the exact blocks to update.
- Around line 717-738: The mapNangoSyncRecord function is using record.key ??
record.id for the eventId but the webhook example only guarantees key, causing
inconsistent parameter expectations; update mapNangoSyncRecord to normalize the
record identifier early (e.g., compute a single recordId = record.key ||
record.id || record.record_id) and then use that normalized recordId everywhere
(eventId formation, computeResourceId call, and any other places that reference
id/key), and add a brief docstring above mapNangoSyncRecord describing which
fields are accepted as canonical ids; reference the existing symbols
mapNangoSyncRecord, eventId construction, computeResourceId,
computeRelayfilePath and mapNangoProviderToSource to locate where to apply the
normalization and doc update.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: af46163f-5b51-4e3f-a6c8-15adefdf9ad1
📒 Files selected for processing (2)
docs/storage-bridge-priority.mddocs/storage-bridge-spec.md
…onvex, HF and Nango catalog Extends the storage bridge priority document with findings from research into six additional systems and Nango's integration catalog: - Supabase Storage: Tier 1 self-hosted (full webhook system in codebase), Tier 2 cloud (soft beta); no Nango fallback available - MongoDB Atlas: Tier 1 via Atlas Triggers → Function → HTTP POST; near-real-time change stream based; at-least-once delivery - Cloudflare R2: Tier 1 with a CF Worker bridge (20 lines); Tier 2 with HTTP pull polling; not a direct HTTP push to arbitrary endpoints - Telegram: Tier 1 technically (< 1s Bot webhook delivery); niche use case for teams sharing files via Telegram as a workflow channel - Convex: Tier 3; no native outbound webhooks; user must wire scheduler + Action + fetch() themselves; file storage has no events at all - Hugging Face Buckets: Tier 3; manual sync only; no notifications Adds Nango integration catalog section showing which systems get scheduled sync fallback for free (13 integrations including Drive, OneDrive, SharePoint, Dropbox, Box, Notion, Confluence, Google Docs, Sheets, Smartsheet, Nextcloud, Airtable, Databricks). Adds updated full priority table covering all systems with push support status, Nango fallback availability, priority, and sprint assignment. https://claude.ai/code/session_01Lbmw1Cj23tw8LovrfFpv24
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/storage-bridge-priority.md`:
- Line 30: The Markdown file has four fenced diagram blocks (the ones containing
"User edits file in Google Drive … Agent sees change via WebSocket", "User
uploads file to SharePoint document library … StorageBridgeEvent(s) published
for each changed item", "File added/modified/deleted in Dropbox …
StorageBridgeEvent published for each changed entry", and "Email received or
label changed … StorageBridgeEvent published per changed thread") missing
language identifiers which triggers markdownlint MD040; update each opening
triple-backtick fence to include a language (use "text") so they read ```text to
satisfy the linter and CI.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 7f880a8e-6066-46c6-ab5a-5b20328a76ce
📒 Files selected for processing (1)
docs/storage-bridge-priority.md
Prioritizes all integrations by where relayfile's value-add is highest: systems where humans and agents write the same records concurrently, where relations between records matter, and where writeback is consequential enough to require conflict detection and ACLs. Four tiers: - Tier 1 (Collaboration): Linear, Jira, GitHub, Notion, Asana, ClickUp already built; gaps are Confluence, Google Docs, SharePoint, Monday, Coda, Figma, Miro - Tier 2 (Action): Salesforce, HubSpot, Zendesk, Intercom, Pipedrive already built; gaps are Freshdesk, ServiceNow, PagerDuty - Tier 3 (Awareness): Slack, Teams already built; gaps are Discord, Gmail, Outlook, Telegram - Tier 4 (Data): read-heavy tools where MCP competes adequately Ranks the 14 highest-priority missing integrations in order, and explains what not to build yet (pure storage, analytics-only, and transactional tools) and why. https://claude.ai/code/session_01Lbmw1Cj23tw8LovrfFpv24
Defines four standard tiers (Free, Starter $79, Growth $499, Enterprise) and three platform plans co-marketed with Nango ($299), Composio ($199), and Pipedream ($149). Platform plans are priced to complement what the customer already pays: Nango plan assumes the customer's Nango subscription handles provider OAuth and sync; Composio plan pairs with tool call routing; Pipedream plan adds shared state to event-driven workflows. Growth bumped to $499 to cover Nango infrastructure costs at margin. Starter bumped to $79 to position above Composio's $29 Starter while reflecting the coordination value relayfile adds over tool call routing. Events definition: webhook ingestions and file writes only — fan-out to agent WebSocket connections does not count, to avoid penalizing multi-agent usage. https://claude.ai/code/session_01Lbmw1Cj23tw8LovrfFpv24
There was a problem hiding this comment.
🧹 Nitpick comments (2)
docs/pricing.md (2)
68-68: ⚡ Quick winClarify Nango pricing references to avoid reader confusion.
This line mentions both "~$200/month for the integration infrastructure" and "Below Nango's own Growth plan ($500)" which conflates Relayfile's infrastructure costs with Nango's customer-facing price. Consider rephrasing to clearly separate these concepts.
📝 Suggested clarification
-**Why $499:** Covers Nango Growth costs (~$200/month for the integration infrastructure) at reasonable margin. Below Nango's own Growth plan ($500) only because relayfile is the coordination layer on top, not a replacement for Nango. Matches what a Composio Professional ($229) + Nango Starter ($50) customer would pay combined — and relayfile gives them both in one. +**Why $499:** Our infrastructure costs for Nango integration (~$200/month) are covered at reasonable margin. Priced just below Nango's own Growth plan ($500) because Relayfile is the coordination layer on top, not a replacement for Nango. Matches what a Composio Professional ($229) + Nango Starter ($50) customer would pay combined — and Relayfile gives them both in one.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/pricing.md` at line 68, The sentence conflates Relayfile's internal infrastructure cost with Nango's customer-facing price; rewrite the line so it clearly separates the two ideas: state that Relayfile's integration infrastructure costs are approximately $200/month (an internal cost component) and that Nango's published Growth plan is $500 as a customer-facing price, clarifying that Relayfile's $499 offering sits below Nango's $500 plan while covering our infrastructure and margin—update the sentence text in the pricing copy where the current phrase starting "Why $499:" appears to reflect this separation.
47-47: 💤 Low valueConsider adding version/date context for external pricing references.
The document references specific competitor prices (Composio Starter at $29, Nango Growth at $500, etc.) without indicating when these prices were current. As competitor pricing evolves, these references will become stale and potentially misleading.
Consider either:
- Adding a "Pricing competitive analysis as of [date]" note near the top of the document
- Adding an HTML comment noting when competitor prices were last verified
- Establishing a quarterly review process for this documentation
Also applies to: 68-68, 137-137, 161-161, 200-200
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/pricing.md` at line 47, Add explicit date/version context for external competitor prices by inserting a "Pricing competitive analysis as of [YYYY-MM-DD]" note near the top of docs/pricing.md and add HTML comments immediately after each external price mention in the "Why $79" section and the other occurrences (the lines referencing Composio Starter $29, Nango Growth $500, etc.) that state the date the price was last verified; also add a short "Review cadence: quarterly" sentence in the doc footer so reviewers know to re-verify prices on a quarterly schedule.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@docs/pricing.md`:
- Line 68: The sentence conflates Relayfile's internal infrastructure cost with
Nango's customer-facing price; rewrite the line so it clearly separates the two
ideas: state that Relayfile's integration infrastructure costs are approximately
$200/month (an internal cost component) and that Nango's published Growth plan
is $500 as a customer-facing price, clarifying that Relayfile's $499 offering
sits below Nango's $500 plan while covering our infrastructure and margin—update
the sentence text in the pricing copy where the current phrase starting "Why
$499:" appears to reflect this separation.
- Line 47: Add explicit date/version context for external competitor prices by
inserting a "Pricing competitive analysis as of [YYYY-MM-DD]" note near the top
of docs/pricing.md and add HTML comments immediately after each external price
mention in the "Why $79" section and the other occurrences (the lines
referencing Composio Starter $29, Nango Growth $500, etc.) that state the date
the price was last verified; also add a short "Review cadence: quarterly"
sentence in the doc footer so reviewers know to re-verify prices on a quarterly
schedule.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 34656beb-efd8-47d5-9038-8d4d3f1c775b
📒 Files selected for processing (1)
docs/pricing.md
Rewrites pricing doc with full decision rationale inline for every number rather than brief "why" notes: - Billing unit rationale: why events over seats/agents, why fan-out is always free (penalizing multi-agent usage contradicts value prop) - Starter $79: covers Nango infra costs at ~40% margin, positioned above Composio's $29 tool-call routing with clear coordination premium - Growth $499: Nango cost problem forced up from $299 — ~$150-200/month Nango infrastructure at Growth makes $299 margin-negative; $499 yields ~60% margin and accidentally matches Nango Growth ($500) as anchor - Enterprise: build-vs-buy framing ($24k-48k to DIY, ongoing maintenance) - Platform plans rationale: combined spend analysis for each — Nango ($349-799 total), Composio (~$428 total), Pipedream (~$200 total) - Gross margin model table: shows infra cost split per plan - Platform plan margins are higher because customer's existing platform subscription covers the expensive provider API layer https://claude.ai/code/session_01Lbmw1Cj23tw8LovrfFpv24
Completes the trail trajectory for pricing strategy including decisions on event-based billing, tier price points, Nango cost problem resolution, and platform-specific plans. https://claude.ai/code/session_01Lbmw1Cj23tw8LovrfFpv24
Review follow-up in this branch also clarifies storage bridge roadmap tiers, markdown fences, Postgres/Redis metadata examples, Nango record id normalization, SQL identifier safety, and pub/sub content threshold cost controls.
https://claude.ai/code/session_01Lbmw1Cj23tw8LovrfFpv24