Skip to content

improve sync durability and add storage bridge specs#118

Open
khaliqgant wants to merge 9 commits intomainfrom
claude/mirage-file-sync-XkcSR
Open

improve sync durability and add storage bridge specs#118
khaliqgant wants to merge 9 commits intomainfrom
claude/mirage-file-sync-XkcSR

Conversation

@khaliqgant
Copy link
Copy Markdown
Member

@khaliqgant khaliqgant commented May 9, 2026

  • Lower default ping interval from 30s to 15s, halving the worst-case silent-failure detection window (~90s -> ~45s for NAT/LB idle drops)
  • Add RelayFileSyncHealthStatus and getHealthStatus() so long-running agents can inspect connection state, degradation reason, last frame time, and reconnect attempts without parsing log output
  • Add onDegraded/onRecovered callbacks to RelayFileSyncOptions; fired when sync involuntarily falls back to polling and when it recovers to a live WebSocket
  • Track stateEnteredAt so callers can measure how long a session has been in the current state
  • Add flushPending() to AutoSyncHandle so recent writes can be synced before checkpoint or process exit
  • Add watchersHealthy() to AutoSyncHandle and track per-watcher degradation when runtime watcher errors occur
  • Add storage bridge specification docs covering StorageBridgeEvent, per-system mappings, adapter worker behavior, Nango fallback, writeback, reliability, observability, rollout phases, and integration prioritization

Review follow-up in this branch also clarifies storage bridge roadmap tiers, markdown fences, Postgres/Redis metadata examples, Nango record id normalization, SQL identifier safety, and pub/sub content threshold cost controls.

https://claude.ai/code/session_01Lbmw1Cj23tw8LovrfFpv24

claude added 2 commits May 9, 2026 07:19
…ending

- Lower default ping interval from 30s to 15s, halving the worst-case
  silent-failure detection window (~90s → ~45s for NAT/LB idle drops)
- Add RelayFileSyncHealthStatus and getHealthStatus() so long-running
  agents can inspect connection state, degradation reason, last frame
  time, and reconnect attempts without parsing log output
- Add onDegraded/onRecovered callbacks to RelayFileSyncOptions; fired
  when the sync involuntarily falls back to polling and when it recovers
  to a live WebSocket — gives agent runtimes a structured hook to alert
  operators rather than relying solely on console.warn
- Track stateEnteredAt so callers can measure how long a session has
  been in degraded state
- Add flushPending() to AutoSyncHandle: cancels pending debounce timers
  and runs a full reconcile, ensuring recent writes are synced before a
  checkpoint or process exit
- Add watchersHealthy() to AutoSyncHandle: exposes whether both
  filesystem watchers are subscribed and active so callers know whether
  change detection is watcher-driven or reconcile-only

https://claude.ai/code/session_01Lbmw1Cj23tw8LovrfFpv24
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 9, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Exposes local watcher health and adds flushPending to auto-sync; instruments WebSocket sync with health status, degraded/recovery callbacks and timestamps, reduces ping interval to 15s; re-exports the health type; adds Storage Bridge spec, priority, integration-priority, and pricing docs.

Changes

Sync Health Monitoring

Layer / File(s) Summary
Health Status Contract
packages/sdk/typescript/src/sync.ts
New RelayFileSyncHealthStatus interface exposes sync state, degraded flag/reason, timing metadata, and reconnect attempts.
Sync Options Extension
packages/sdk/typescript/src/sync.ts
RelayFileSyncOptions adds onDegraded and onRecovered callbacks.
WebSocket Health Implementation
packages/sdk/typescript/src/sync.ts
RelayFileSync stores handlers, initializes degraded-health state, exposes getHealthStatus(), updates stateEnteredAt on transitions, and reduces ping interval to 15s.
Degraded & Recovered Transitions
packages/sdk/typescript/src/sync.ts
On polling fallback mark degraded and reason and invoke onDegraded once; on WebSocket open clear degraded state and invoke onRecovered (handler errors debug-logged).
Local Mount Watcher Health
packages/local-mount/src/auto-sync.ts
AutoSyncHandle adds flushPending() and watchersHealthy(); internal _watchersHealthy set when both watchers subscribe and cleared on stop(); flushPending() clears debounce timers then runs reconcile.
SDK Public API
packages/sdk/typescript/src/index.ts
Re-exports RelayFileSyncHealthStatus from the SDK entrypoint.

Storage Bridge & Docs

Layer / File(s) Summary
Storage Bridge Specification
docs/storage-bridge-spec.md
Adds storage bridge spec: StorageBridgeEvent envelope, per-backend bridge designs (S3, Postgres, Redis, GCS, Azure), adapter worker flow, writeback semantics, reliability, config schema, observability, phases, and open questions.
Storage Bridge Priority
docs/storage-bridge-priority.md
Adds prioritization tiers, per-system notes, 4-sprint roadmap, additional systems catalog, Nango catalog, and updated priority table.
Integration Priority Doc
docs/integration-priority.md
Adds integration prioritization framework, tier definitions, already-built vs missing adapters, and recommended next integrations.
Pricing Doc
docs/pricing.md
Adds pricing model, plan limits, overage rates, platform plans, event definitions, and competitive anchoring.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

🐰 I thumped the log and cleared the heap,

Watchers listen while the tunnels sleep.
When sockets droop I mark the fall,
Then cheer when live restores the call.
Flush the queues — the burrow's all.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title accurately summarizes the main changes: sync durability improvements (health status API, faster heartbeat, flush pending) and storage bridge specifications.
Description check ✅ Passed The description comprehensively details all changes including ping interval reduction, health status APIs, callbacks, watchers health tracking, flush pending functionality, and storage bridge specification documentation.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/mirage-file-sync-XkcSR

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/local-mount/src/auto-sync.ts (1)

170-180: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Track per-subscription watcher health and update on runtime errors.

_watchersHealthy is only set during initial subscribe success and stop(). When the watcher callback receives an error at runtime, it calls onError(err) but does not update health state, so watchersHealthy() can return stale-true after a subscription degrades. Per @parcel/watcher semantics, callback err indicates the subscription is no longer reliable and should be treated as degraded.

Introduce per-subscription health flags (_mountWatcherHealthy, _projectWatcherHealthy) and update them in the callback when err occurs, recomputing the overall health as both-healthy. Pass a callback to subscribeTo() to mark the subscription unhealthy on error:

Suggested approach
 let mountSub: AsyncSubscription | undefined;
 let projectSub: AsyncSubscription | undefined;
 let _watchersHealthy = false;
+let _mountWatcherHealthy = false;
+let _projectWatcherHealthy = false;
+const recomputeWatchersHealthy = (): void => {
+  _watchersHealthy = _mountWatcherHealthy && _projectWatcherHealthy;
+};

-const subscribeTo = (root: string): Promise<AsyncSubscription> =>
+const subscribeTo = (
+  root: string,
+  onRuntimeError: () => void
+): Promise<AsyncSubscription> =>
   watcher.subscribe(
     root,
     (err, events) => {
-      if (err) { onError(err); return; }
+      if (err) {
+        onRuntimeError();
+        recomputeWatchersHealthy();
+        onError(err);
+        return;
+      }
       for (const ev of events) {
         schedulePathSync(root, ev.path);
       }
     },
     { ignore: buildIgnoreGlobs(ctx, root) }
   );

 const watchersReady = (async () => {
   const [mountResult, projectResult] = await Promise.allSettled([
-    subscribeTo(ctx.realMountDir),
-    subscribeTo(ctx.realProjectDir),
+    subscribeTo(ctx.realMountDir, () => { _mountWatcherHealthy = false; }),
+    subscribeTo(ctx.realProjectDir, () => { _projectWatcherHealthy = false; }),
   ]);
   if (mountResult.status === 'fulfilled') mountSub = mountResult.value;
   if (projectResult.status === 'fulfilled') projectSub = projectResult.value;
   if (mountResult.status === 'fulfilled' && projectResult.status === 'fulfilled') {
-    _watchersHealthy = true;
+    _mountWatcherHealthy = true;
+    _projectWatcherHealthy = true;
+    recomputeWatchersHealthy();
     return;

Also applies to: lines 184–197, 261

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/local-mount/src/auto-sync.ts` around lines 170 - 180, The
subscription callback treats runtime errors via onError(err) but never updates
the watcher-health state, so watchersHealthy() can stay true after a
subscription degrades; update health per-subscription: add boolean flags
_mountWatcherHealthy and _projectWatcherHealthy, set them to true on successful
subscribe in subscribeTo() and set the appropriate flag to false inside the
watcher.subscribe callback when err is non-null (then recompute _watchersHealthy
as _mountWatcherHealthy && _projectWatcherHealthy), and ensure stop() still
clears flags; modify subscribeTo(root, markUnhealthyCallback) or accept a
callback parameter so the callback invoked on err flips the correct
per-subscription flag, then keep calling onError(err) and schedulePathSync(root,
ev.path) unchanged for normal events.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/sdk/typescript/src/sync.ts`:
- Around line 548-557: The recovery callback runs before the state is set to
"open", so move the state transition to occur before invoking onRecovered:
ensure this.degraded and this.degradedReason are cleared, call
this.setState("open") next, then invoke this.onRecovered() inside the existing
try/catch (preserving debugLog on error) so any handler that calls
getHealthStatus() will observe the "open" state.
- Around line 389-397: Constructor sets this.state = "closed" when
options.signal is already aborted but doesn't update stateEnteredAt because it
skips calling setState; update the constructor to either call setState("closed")
instead of assigning this.state directly or explicitly set this.stateEnteredAt =
Date.now() (or use the same timestamp logic as setState) when handling
options.signal.aborted so getHealthStatus() returns an accurate stateEnteredAt;
references: constructor, options.signal, this.state, setState(), stateEnteredAt,
getHealthStatus().

---

Outside diff comments:
In `@packages/local-mount/src/auto-sync.ts`:
- Around line 170-180: The subscription callback treats runtime errors via
onError(err) but never updates the watcher-health state, so watchersHealthy()
can stay true after a subscription degrades; update health per-subscription: add
boolean flags _mountWatcherHealthy and _projectWatcherHealthy, set them to true
on successful subscribe in subscribeTo() and set the appropriate flag to false
inside the watcher.subscribe callback when err is non-null (then recompute
_watchersHealthy as _mountWatcherHealthy && _projectWatcherHealthy), and ensure
stop() still clears flags; modify subscribeTo(root, markUnhealthyCallback) or
accept a callback parameter so the callback invoked on err flips the correct
per-subscription flag, then keep calling onError(err) and schedulePathSync(root,
ev.path) unchanged for normal events.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 18b59ddf-34f0-4f40-b2ae-1b308acd86df

📥 Commits

Reviewing files that changed from the base of the PR and between c8ef4d5 and b2e0115.

⛔ Files ignored due to path filters (1)
  • package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (3)
  • packages/local-mount/src/auto-sync.ts
  • packages/sdk/typescript/src/index.ts
  • packages/sdk/typescript/src/sync.ts

Comment thread packages/sdk/typescript/src/sync.ts
Comment thread packages/sdk/typescript/src/sync.ts Outdated
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View 5 additional findings in Devin Review.

Open in Devin Review

Comment on lines +548 to +556
if (this.degraded) {
this.degraded = false;
this.degradedReason = undefined;
try {
this.onRecovered?.();
} catch (error) {
debugLog("onRecovered handler threw", error);
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 onRecovered fires before setState("open"), exposing inconsistent health status

The onRecovered callback at line 552 fires before this.setState("open") at line 557. At that point this.degraded has already been cleared to false (line 549) but this.state is still "reconnecting" (or "connecting"). If the onRecovered handler calls getHealthStatus() (packages/sdk/typescript/src/sync.ts:389-398) or getState(), it observes a contradictory snapshot: { state: "reconnecting", degraded: false, stateEnteredAt: <old-timestamp>, ... }. The contract documented in the JSDoc says recovery means "back to a live WebSocket" — the caller would reasonably expect state: "open". The recovery block (lines 548-556) should be moved to after this.setState("open") so the state is consistent when the callback fires.

Prompt for agents
In the WebSocket `open` event handler inside `openWebSocketWithToken` (packages/sdk/typescript/src/sync.ts, around line 537-561), the recovery block (if this.degraded ... onRecovered) fires before this.setState("open"). This means any onRecovered callback that calls getHealthStatus() or getState() will see state: "reconnecting" with degraded: false, which is contradictory.

To fix: move the entire `if (this.degraded) { ... }` block (currently lines 548-556) to after the `this.setState("open")` call at line 557. This ensures the state machine is in the "open" state before the recovery callback fires, giving callers a consistent view. Consider placing it after `this.emit("open", event)` at line 560 so that all internal state transitions and event emissions complete before the user callback.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Adds two documents covering the storage bridge architecture:

storage-bridge-spec.md — comprehensive technical spec for bridging
infrastructure storage systems (S3, Postgres, Redis, GCS, Azure Blob)
into relayfile via a common StorageBridgeEvent envelope, pub/sub layer,
and storage adapter worker. Covers per-system bridge implementations,
Nango scheduled sync fallback, writeback, error handling, security,
observability, and a phased implementation plan.

storage-bridge-priority.md — prioritizes integrations by native push
notification capability. Tier 1 (Google Drive, SharePoint/OneDrive,
Dropbox, Gmail, GCS, Azure Blob) delivers true real-time awareness and
is our strongest differentiator vs MCP and Mirage. Includes a 4-sprint
implementation roadmap ordered by addressable market and shared
infrastructure.

https://claude.ai/code/session_01Lbmw1Cj23tw8LovrfFpv24
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/storage-bridge-spec.md (1)

1-1260: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Critical: PR objectives mismatch with file contents.

The PR description states this PR improves sync durability with health status API, faster heartbeat intervals, flush pending functionality, and WebSocket sync improvements. However, the files included are comprehensive storage bridge specification documents that appear completely unrelated to the described changes.

Either:

  1. The wrong files were included in this PR, or
  2. The PR description is incorrect and should describe the storage bridge specification work

Please clarify which is correct and update accordingly.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/storage-bridge-spec.md` around lines 1 - 1260, The PR description claims
changes to sync durability (health status API, heartbeat intervals, flush
pending, WebSocket sync) but the patch only contains the storage bridge spec
(e.g., types and functions like StorageBridgeEvent, mapS3Event, runS3Bridge,
runPostgresListenBridge, runRedisBridge, fetchContent, ingestEvent), so fix the
mismatch by either: (A) replacing the included files with the actual
implementation changes for sync durability (ensure commits include the health
API, heartbeat code, flush pending logic, WebSocket sync changes and update PR
title/body to match), or (B) updating the PR description, title, and changelog
to accurately describe that this PR adds the storage bridge spec document and
list the new spec artifacts (StorageBridgeEvent, per-system mappings, adapter
worker behavior, Nango integration, etc.); also verify the commit list contains
only intended files and update the reviewer checklist accordingly.
🧹 Nitpick comments (7)
docs/storage-bridge-priority.md (3)

30-47: 💤 Low value

Add language specifiers to flow diagrams.

The flow diagrams would benefit from a language hint like text or mermaid if they can be converted to proper Mermaid diagrams for better rendering. As per static analysis hints.

Also applies to: 69-80, 101-112, 135-149

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/storage-bridge-priority.md` around lines 30 - 47, The flow diagram code
fences lack language specifiers; update each fenced block (the shown diagram and
the other similar blocks at the other locations) to include an appropriate
language hint such as text or mermaid so the renderer can apply correct
formatting (e.g., replace ``` with ```text or ```mermaid for the diagrams in the
docs/local flow blocks); ensure you update all instances referenced in the
review (the block in the comment and the blocks around the other ranges) so
every diagram fence has a language specifier.

290-297: 💤 Low value

User count estimates may need citation or verification.

The table claims "~3B (Google Workspace)" and "+1.5B (Microsoft 365)" addressable users. These are rough market size estimates that should either be cited or marked as approximate/illustrative to avoid appearing as precise metrics.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/storage-bridge-priority.md` around lines 290 - 297, The "Total
addressable users" row contains bold numeric claims ("~3B (Google Workspace)",
"+1.5B (Microsoft 365)", "+700M (Dropbox + Gmail personal)") that need
validation or qualification—either add citations for each estimate or change the
wording to clearly mark them as illustrative/approximate (e.g., "approx." or
"illustrative estimate") in the table; update the table cells for "Total
addressable users" accordingly and, if adding citations, reference authoritative
sources immediately after each figure (or add a footnote) so the values in the
"Total addressable users" row are verifiable.

235-248: ⚡ Quick win

Sprint 1 scope may be overcommitted for 4 weeks.

Sprint 1 includes:

  • Google Drive Watch API subscription manager (create, renew, delete)
  • Drive Changes API delta fetcher
  • GCS Pub/Sub notification bridge
  • Storage adapter worker with pub/sub backend
  • File path conventions for both systems
  • Writeback for Drive (create, update, delete)
  • Nango scheduled sync fallback for both

This represents 7 major deliverables, including a complete adapter worker implementation. For a 4-week sprint, this may be optimistic unless the team is large or has prior experience with similar systems. Consider whether this should be split or if the estimate should be extended.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/storage-bridge-priority.md` around lines 235 - 248, Sprint 1 is
overcommitted with seven major deliverables; split into a smaller MVP and
follow-up work by moving lower-priority or higher-effort items to a subsequent
sprint. Keep the core MVP in Sprint 1: Google Drive Watch API subscription
manager, Drive Changes API delta fetcher, and GCS Pub/Sub notification bridge;
defer the full Storage adapter worker implementation, Drive writeback
(create/update/delete), and Nango scheduled sync fallback to Sprint 2 (or split
the worker into a lightweight pub/sub-only adapter in Sprint 1 and full
writeback in Sprint 2). Update the plan to reflect the new split and adjust
timelines/estimates accordingly.
docs/storage-bridge-spec.md (4)

1247-1247: 🏗️ Heavy lift

Content size threshold decision impacts costs significantly.

Open question #1 raises a critical tradeoff: including content in pub/sub messages under 256 KB saves latency but increases message costs. For high-volume sources (e.g., S3 buckets with millions of small files), this could mean 10-100x higher pub/sub costs. Recommend establishing a cost model before deciding, and making this threshold configurable per source.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/storage-bridge-spec.md` at line 1247, The decision about whether the
pub/sub envelope includes object content under a size threshold (e.g., 256 KB)
affects cost and latency; update the spec to require (1) a documented cost model
to quantify message cost vs latency for high-volume sources and (2) a
configurable per-source "pubsub_content_threshold" parameter (defaulting to a
safe value like 0 or 256KB) that the adapter honors when deciding to embed
content in the envelope versus fetching it on delivery; reference the terms
"pub/sub envelope", "adapter", and "pubsub_content_threshold" in the spec so
implementers can apply the threshold per source and opt into the behavior.

294-294: ⚡ Quick win

SQL trigger assumes 'id' column exists.

The comment -- assumes id column; configurable acknowledges this assumption, but the actual SQL hardcodes row_to_json(NEW).id. For a production-ready spec, consider showing the configurable version or noting this is a simplified example.

Suggested clarification
-    'pk',     row_to_json(NEW).id  -- assumes id column; configurable
+    'pk',     (row_to_json(NEW) ->> '${pk_column_name}')::text  -- configurable via trigger parameter

Or add a note that this is a simplified example and production triggers would be generated per-table with the actual PK column name.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/storage-bridge-spec.md` at line 294, The SQL trigger currently hardcodes
row_to_json(NEW).id for the 'pk' value, which assumes every table uses an id
column; update the spec to either show a configurable version that uses a
placeholder or trigger argument (e.g., NEW.<primary_key_column> or TG_ARGV[0])
and describe that production triggers should be generated per-table with the
actual PK column name, or explicitly state this is a simplified example;
reference the expression row_to_json(NEW).id and the 'pk' mapping so reviewers
know exactly where to replace or parameterize the PK access.

29-67: 💤 Low value

Add language specifiers to fenced code blocks.

Multiple fenced code blocks throughout the document lack language specifiers. While the ASCII diagrams (architecture, flow diagrams) should remain unlabeled, code blocks showing configuration, topic names, or structured examples should specify the format for better rendering and syntax highlighting.

As per static analysis hints, add language specifiers where appropriate (e.g., yaml, text, bash).

Examples of blocks that could benefit from language hints

Line 191-197 (S3 setup structure): Could use text or yaml
Line 415-421 (Redis config): Should be ini or conf
Line 888-891 (Topic naming): Could use text
Line 1047-1051 (Dead-letter topic): Could use text

Also applies to: 191-256, 415-502, 657-671, 888-909, 1047-1051

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/storage-bridge-spec.md` around lines 29 - 67, Several fenced code blocks
lack language specifiers; leave ASCII diagrams (e.g., the "Storage Systems" /
"Per-System Bridge Processes" diagram) unlabeled but add appropriate language
tags to configuration and example blocks (e.g., the S3 setup section around the
"S3 setup structure" block, Redis config blocks near "Redis config", topic
naming examples containing "relayfile.storage.events.{workspace_id}", and the
dead-letter topic example). Edit the fenced blocks that show configuration,
topic names, or CLI examples to include language hints like yaml, text, bash, or
ini (e.g., add ```yaml, ```text, ```bash, or ```ini after the opening ```),
ensuring blocks that are pure ASCII diagrams remain without a language tag; use
the strings "relayfile.storage.events.{workspace_id}", "POST
/v1/workspaces/{id}/webhooks/ingest", and the S3/Redis section headings to
locate the exact blocks to update.

717-738: 💤 Low value

Function signature shows inconsistent parameter naming.

The function mapNangoSyncRecord expects record.key ?? record.id (line 721) but the example webhook payload only shows key as a field. If different Nango sync models use different identifier fields, this should be documented or the fallback logic clarified.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/storage-bridge-spec.md` around lines 717 - 738, The mapNangoSyncRecord
function is using record.key ?? record.id for the eventId but the webhook
example only guarantees key, causing inconsistent parameter expectations; update
mapNangoSyncRecord to normalize the record identifier early (e.g., compute a
single recordId = record.key || record.id || record.record_id) and then use that
normalized recordId everywhere (eventId formation, computeResourceId call, and
any other places that reference id/key), and add a brief docstring above
mapNangoSyncRecord describing which fields are accepted as canonical ids;
reference the existing symbols mapNangoSyncRecord, eventId construction,
computeResourceId, computeRelayfilePath and mapNangoProviderToSource to locate
where to apply the normalization and doc update.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/storage-bridge-priority.md`:
- Around line 242-243: The roadmap entry mentions "GCS Pub/Sub notification
bridge (from `storage-bridge-spec.md` Phase 1)" but that cross-reference is
incorrect; locate the phrase "GCS Pub/Sub notification bridge" in
docs/storage-bridge-priority.md and update the reference to point to the correct
phase in storage-bridge-spec.md (replace "Phase 1" with "Phase 2" or remove the
phase pointer), or adjust the roadmap item to match the actual intent (e.g.,
change to S3/Nango webhook if Phase 1 is intended); ensure the text and any
parenthetical reference to `storage-bridge-spec.md` consistently reflect the
same phase.
- Around line 224-229: The table currently lists "AWS S3" as Tier 3 with Nango
as 1–5 min fallback, which contradicts the spec that treats S3+SQS as the
real-time Tier 1 path; update the "AWS S3" row so that S3 with SQS Event
Notifications is shown as Tier 1 (primary real-time) and only show "Nango poll"
as the Tier 3 fallback option, and replace the existing footnote text with a
concise note that "S3 with SQS Event Notifications is Tier 1; polling via Nango
or other pollers is a Tier 3 fallback" so the table and note (references: "AWS
S3", "SQS events", "Nango poll") consistently reflect the spec.

In `@docs/storage-bridge-spec.md`:
- Around line 506-509: The doc currently claims "No Nango fallback for Redis —
Nango does not have a Redis integration" which conflicts with the architecture's
statement that Nango provides near-real-time fallbacks; update the
storage-bridge spec to be consistent by either (A) explicitly marking Redis as
"real-time-only (no Nango fallback)" in the latency tier table referenced around
the "latency tier table" / lines 77-83, and clarifying the statement "No Nango
fallback for Redis" in the Redis notes section (the paragraph containing "No
Nango fallback for Redis") or (B) if Redis can use Nango, remove or reword that
sentence and describe how Nango would be used; ensure both the Redis notes (the
bullets including "Keyspace notifications..." and "No Nango fallback for Redis")
and the latency tier table reference the same behavior for Redis so the doc is
consistent.
- Around line 340-343: The client error handler currently sleeps 5s and calls
runPostgresListenBridge(config) unconditionally, creating an infinite tight
retry loop; change the error handling in the client.on("error") callback to use
an exponential backoff strategy with a capped delay and a max-retry limit (e.g.,
double delay each retry up to a ceiling, track retries in a retry counter) and
log each attempt; when max retries are exceeded, stop restarting and surface the
error (or emit a terminal failure event) so persistent failures (like bad
credentials) don’t loop forever. Ensure you update references in the same scope
where client, sleep, and runPostgresListenBridge(config) are used so the retry
counter and backoff delay persist across restarts or are stored externally.
- Around line 982-990: The INSERT/ON CONFLICT SQL is vulnerable because schema
and table are interpolated directly; validate or safely quote those identifiers
before building the query (do not rely on parameter placeholders, which cannot
be used for identifiers). Update the call that constructs the SQL (the block
that references schema, table, pkColumn, tableConfig, and config.client.query)
to either (a) validate schema and table against a strict whitelist/regex (e.g.
allowed identifier pattern) OR (b) use a safe identifier-quoting helper such as
pg-format's format.ident (or an equivalent function) to escape
schema/table/pkColumn, and continue to pass the row payload as a parameterized
value. Ensure pkColumn is also validated/quoted and only non-identifier content
is left interpolated after this sanitization.
- Around line 809-810: The Postgres and Redis bridge implementations don't add
the actual content to metadata, so code reading
event.metadata["postgres.row_json"] and event.metadata["redis.value"] gets
"null"; update the Postgres bridge where fetchRow() is used (the function that
retrieves the row) to add metadata["postgres.row_json"] = JSON.stringify(row ??
null) (or an appropriate stringified representation), and update the Redis
bridge where the value is retrieved to add metadata["redis.value"] = value ==
null ? "null" : value.toString() (or JSON.stringify for structured values);
ensure the metadata keys exactly match "postgres.row_json" and "redis.value" so
the adapter's Buffer.from(...) calls receive the actual content.

---

Outside diff comments:
In `@docs/storage-bridge-spec.md`:
- Around line 1-1260: The PR description claims changes to sync durability
(health status API, heartbeat intervals, flush pending, WebSocket sync) but the
patch only contains the storage bridge spec (e.g., types and functions like
StorageBridgeEvent, mapS3Event, runS3Bridge, runPostgresListenBridge,
runRedisBridge, fetchContent, ingestEvent), so fix the mismatch by either: (A)
replacing the included files with the actual implementation changes for sync
durability (ensure commits include the health API, heartbeat code, flush pending
logic, WebSocket sync changes and update PR title/body to match), or (B)
updating the PR description, title, and changelog to accurately describe that
this PR adds the storage bridge spec document and list the new spec artifacts
(StorageBridgeEvent, per-system mappings, adapter worker behavior, Nango
integration, etc.); also verify the commit list contains only intended files and
update the reviewer checklist accordingly.

---

Nitpick comments:
In `@docs/storage-bridge-priority.md`:
- Around line 30-47: The flow diagram code fences lack language specifiers;
update each fenced block (the shown diagram and the other similar blocks at the
other locations) to include an appropriate language hint such as text or mermaid
so the renderer can apply correct formatting (e.g., replace ``` with ```text or
```mermaid for the diagrams in the docs/local flow blocks); ensure you update
all instances referenced in the review (the block in the comment and the blocks
around the other ranges) so every diagram fence has a language specifier.
- Around line 290-297: The "Total addressable users" row contains bold numeric
claims ("~3B (Google Workspace)", "+1.5B (Microsoft 365)", "+700M (Dropbox +
Gmail personal)") that need validation or qualification—either add citations for
each estimate or change the wording to clearly mark them as
illustrative/approximate (e.g., "approx." or "illustrative estimate") in the
table; update the table cells for "Total addressable users" accordingly and, if
adding citations, reference authoritative sources immediately after each figure
(or add a footnote) so the values in the "Total addressable users" row are
verifiable.
- Around line 235-248: Sprint 1 is overcommitted with seven major deliverables;
split into a smaller MVP and follow-up work by moving lower-priority or
higher-effort items to a subsequent sprint. Keep the core MVP in Sprint 1:
Google Drive Watch API subscription manager, Drive Changes API delta fetcher,
and GCS Pub/Sub notification bridge; defer the full Storage adapter worker
implementation, Drive writeback (create/update/delete), and Nango scheduled sync
fallback to Sprint 2 (or split the worker into a lightweight pub/sub-only
adapter in Sprint 1 and full writeback in Sprint 2). Update the plan to reflect
the new split and adjust timelines/estimates accordingly.

In `@docs/storage-bridge-spec.md`:
- Line 1247: The decision about whether the pub/sub envelope includes object
content under a size threshold (e.g., 256 KB) affects cost and latency; update
the spec to require (1) a documented cost model to quantify message cost vs
latency for high-volume sources and (2) a configurable per-source
"pubsub_content_threshold" parameter (defaulting to a safe value like 0 or
256KB) that the adapter honors when deciding to embed content in the envelope
versus fetching it on delivery; reference the terms "pub/sub envelope",
"adapter", and "pubsub_content_threshold" in the spec so implementers can apply
the threshold per source and opt into the behavior.
- Line 294: The SQL trigger currently hardcodes row_to_json(NEW).id for the 'pk'
value, which assumes every table uses an id column; update the spec to either
show a configurable version that uses a placeholder or trigger argument (e.g.,
NEW.<primary_key_column> or TG_ARGV[0]) and describe that production triggers
should be generated per-table with the actual PK column name, or explicitly
state this is a simplified example; reference the expression row_to_json(NEW).id
and the 'pk' mapping so reviewers know exactly where to replace or parameterize
the PK access.
- Around line 29-67: Several fenced code blocks lack language specifiers; leave
ASCII diagrams (e.g., the "Storage Systems" / "Per-System Bridge Processes"
diagram) unlabeled but add appropriate language tags to configuration and
example blocks (e.g., the S3 setup section around the "S3 setup structure"
block, Redis config blocks near "Redis config", topic naming examples containing
"relayfile.storage.events.{workspace_id}", and the dead-letter topic example).
Edit the fenced blocks that show configuration, topic names, or CLI examples to
include language hints like yaml, text, bash, or ini (e.g., add ```yaml,
```text, ```bash, or ```ini after the opening ```), ensuring blocks that are
pure ASCII diagrams remain without a language tag; use the strings
"relayfile.storage.events.{workspace_id}", "POST
/v1/workspaces/{id}/webhooks/ingest", and the S3/Redis section headings to
locate the exact blocks to update.
- Around line 717-738: The mapNangoSyncRecord function is using record.key ??
record.id for the eventId but the webhook example only guarantees key, causing
inconsistent parameter expectations; update mapNangoSyncRecord to normalize the
record identifier early (e.g., compute a single recordId = record.key ||
record.id || record.record_id) and then use that normalized recordId everywhere
(eventId formation, computeResourceId call, and any other places that reference
id/key), and add a brief docstring above mapNangoSyncRecord describing which
fields are accepted as canonical ids; reference the existing symbols
mapNangoSyncRecord, eventId construction, computeResourceId,
computeRelayfilePath and mapNangoProviderToSource to locate where to apply the
normalization and doc update.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: af46163f-5b51-4e3f-a6c8-15adefdf9ad1

📥 Commits

Reviewing files that changed from the base of the PR and between b2e0115 and 37362d1.

📒 Files selected for processing (2)
  • docs/storage-bridge-priority.md
  • docs/storage-bridge-spec.md

Comment thread docs/storage-bridge-priority.md Outdated
Comment thread docs/storage-bridge-priority.md Outdated
Comment thread docs/storage-bridge-spec.md Outdated
Comment thread docs/storage-bridge-spec.md
Comment thread docs/storage-bridge-spec.md
Comment thread docs/storage-bridge-spec.md
…onvex, HF and Nango catalog

Extends the storage bridge priority document with findings from research
into six additional systems and Nango's integration catalog:

- Supabase Storage: Tier 1 self-hosted (full webhook system in codebase),
  Tier 2 cloud (soft beta); no Nango fallback available
- MongoDB Atlas: Tier 1 via Atlas Triggers → Function → HTTP POST;
  near-real-time change stream based; at-least-once delivery
- Cloudflare R2: Tier 1 with a CF Worker bridge (20 lines); Tier 2 with
  HTTP pull polling; not a direct HTTP push to arbitrary endpoints
- Telegram: Tier 1 technically (< 1s Bot webhook delivery); niche use
  case for teams sharing files via Telegram as a workflow channel
- Convex: Tier 3; no native outbound webhooks; user must wire scheduler
  + Action + fetch() themselves; file storage has no events at all
- Hugging Face Buckets: Tier 3; manual sync only; no notifications

Adds Nango integration catalog section showing which systems get
scheduled sync fallback for free (13 integrations including Drive,
OneDrive, SharePoint, Dropbox, Box, Notion, Confluence, Google Docs,
Sheets, Smartsheet, Nextcloud, Airtable, Databricks).

Adds updated full priority table covering all systems with push support
status, Nango fallback availability, priority, and sprint assignment.

https://claude.ai/code/session_01Lbmw1Cj23tw8LovrfFpv24
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/storage-bridge-priority.md`:
- Line 30: The Markdown file has four fenced diagram blocks (the ones containing
"User edits file in Google Drive … Agent sees change via WebSocket", "User
uploads file to SharePoint document library … StorageBridgeEvent(s) published
for each changed item", "File added/modified/deleted in Dropbox …
StorageBridgeEvent published for each changed entry", and "Email received or
label changed … StorageBridgeEvent published per changed thread") missing
language identifiers which triggers markdownlint MD040; update each opening
triple-backtick fence to include a language (use "text") so they read ```text to
satisfy the linter and CI.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 7f880a8e-6066-46c6-ab5a-5b20328a76ce

📥 Commits

Reviewing files that changed from the base of the PR and between 37362d1 and a9f4823.

📒 Files selected for processing (1)
  • docs/storage-bridge-priority.md

Comment thread docs/storage-bridge-priority.md Outdated
claude added 2 commits May 9, 2026 08:10
Prioritizes all integrations by where relayfile's value-add is highest:
systems where humans and agents write the same records concurrently,
where relations between records matter, and where writeback is
consequential enough to require conflict detection and ACLs.

Four tiers:
- Tier 1 (Collaboration): Linear, Jira, GitHub, Notion, Asana, ClickUp
  already built; gaps are Confluence, Google Docs, SharePoint, Monday,
  Coda, Figma, Miro
- Tier 2 (Action): Salesforce, HubSpot, Zendesk, Intercom, Pipedrive
  already built; gaps are Freshdesk, ServiceNow, PagerDuty
- Tier 3 (Awareness): Slack, Teams already built; gaps are Discord,
  Gmail, Outlook, Telegram
- Tier 4 (Data): read-heavy tools where MCP competes adequately

Ranks the 14 highest-priority missing integrations in order, and
explains what not to build yet (pure storage, analytics-only, and
transactional tools) and why.

https://claude.ai/code/session_01Lbmw1Cj23tw8LovrfFpv24
Defines four standard tiers (Free, Starter $79, Growth $499, Enterprise)
and three platform plans co-marketed with Nango ($299), Composio ($199),
and Pipedream ($149).

Platform plans are priced to complement what the customer already pays:
Nango plan assumes the customer's Nango subscription handles provider
OAuth and sync; Composio plan pairs with tool call routing; Pipedream
plan adds shared state to event-driven workflows.

Growth bumped to $499 to cover Nango infrastructure costs at margin.
Starter bumped to $79 to position above Composio's $29 Starter while
reflecting the coordination value relayfile adds over tool call routing.

Events definition: webhook ingestions and file writes only — fan-out to
agent WebSocket connections does not count, to avoid penalizing
multi-agent usage.

https://claude.ai/code/session_01Lbmw1Cj23tw8LovrfFpv24
@khaliqgant khaliqgant changed the title improve sync durability: health status API, faster heartbeat, flush pending improve sync durability and add storage bridge specs May 9, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
docs/pricing.md (2)

68-68: ⚡ Quick win

Clarify Nango pricing references to avoid reader confusion.

This line mentions both "~$200/month for the integration infrastructure" and "Below Nango's own Growth plan ($500)" which conflates Relayfile's infrastructure costs with Nango's customer-facing price. Consider rephrasing to clearly separate these concepts.

📝 Suggested clarification
-**Why $499:** Covers Nango Growth costs (~$200/month for the integration infrastructure) at reasonable margin. Below Nango's own Growth plan ($500) only because relayfile is the coordination layer on top, not a replacement for Nango. Matches what a Composio Professional ($229) + Nango Starter ($50) customer would pay combined — and relayfile gives them both in one.
+**Why $499:** Our infrastructure costs for Nango integration (~$200/month) are covered at reasonable margin. Priced just below Nango's own Growth plan ($500) because Relayfile is the coordination layer on top, not a replacement for Nango. Matches what a Composio Professional ($229) + Nango Starter ($50) customer would pay combined — and Relayfile gives them both in one.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/pricing.md` at line 68, The sentence conflates Relayfile's internal
infrastructure cost with Nango's customer-facing price; rewrite the line so it
clearly separates the two ideas: state that Relayfile's integration
infrastructure costs are approximately $200/month (an internal cost component)
and that Nango's published Growth plan is $500 as a customer-facing price,
clarifying that Relayfile's $499 offering sits below Nango's $500 plan while
covering our infrastructure and margin—update the sentence text in the pricing
copy where the current phrase starting "Why $499:" appears to reflect this
separation.

47-47: 💤 Low value

Consider adding version/date context for external pricing references.

The document references specific competitor prices (Composio Starter at $29, Nango Growth at $500, etc.) without indicating when these prices were current. As competitor pricing evolves, these references will become stale and potentially misleading.

Consider either:

  • Adding a "Pricing competitive analysis as of [date]" note near the top of the document
  • Adding an HTML comment noting when competitor prices were last verified
  • Establishing a quarterly review process for this documentation

Also applies to: 68-68, 137-137, 161-161, 200-200

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/pricing.md` at line 47, Add explicit date/version context for external
competitor prices by inserting a "Pricing competitive analysis as of
[YYYY-MM-DD]" note near the top of docs/pricing.md and add HTML comments
immediately after each external price mention in the "Why $79" section and the
other occurrences (the lines referencing Composio Starter $29, Nango Growth
$500, etc.) that state the date the price was last verified; also add a short
"Review cadence: quarterly" sentence in the doc footer so reviewers know to
re-verify prices on a quarterly schedule.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@docs/pricing.md`:
- Line 68: The sentence conflates Relayfile's internal infrastructure cost with
Nango's customer-facing price; rewrite the line so it clearly separates the two
ideas: state that Relayfile's integration infrastructure costs are approximately
$200/month (an internal cost component) and that Nango's published Growth plan
is $500 as a customer-facing price, clarifying that Relayfile's $499 offering
sits below Nango's $500 plan while covering our infrastructure and margin—update
the sentence text in the pricing copy where the current phrase starting "Why
$499:" appears to reflect this separation.
- Line 47: Add explicit date/version context for external competitor prices by
inserting a "Pricing competitive analysis as of [YYYY-MM-DD]" note near the top
of docs/pricing.md and add HTML comments immediately after each external price
mention in the "Why $79" section and the other occurrences (the lines
referencing Composio Starter $29, Nango Growth $500, etc.) that state the date
the price was last verified; also add a short "Review cadence: quarterly"
sentence in the doc footer so reviewers know to re-verify prices on a quarterly
schedule.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 34656beb-efd8-47d5-9038-8d4d3f1c775b

📥 Commits

Reviewing files that changed from the base of the PR and between beb513c and e5da171.

📒 Files selected for processing (1)
  • docs/pricing.md

claude added 2 commits May 9, 2026 08:58
Rewrites pricing doc with full decision rationale inline for every
number rather than brief "why" notes:

- Billing unit rationale: why events over seats/agents, why fan-out is
  always free (penalizing multi-agent usage contradicts value prop)
- Starter $79: covers Nango infra costs at ~40% margin, positioned above
  Composio's $29 tool-call routing with clear coordination premium
- Growth $499: Nango cost problem forced up from $299 — ~$150-200/month
  Nango infrastructure at Growth makes $299 margin-negative; $499 yields
  ~60% margin and accidentally matches Nango Growth ($500) as anchor
- Enterprise: build-vs-buy framing ($24k-48k to DIY, ongoing maintenance)
- Platform plans rationale: combined spend analysis for each — Nango
  ($349-799 total), Composio (~$428 total), Pipedream (~$200 total)
- Gross margin model table: shows infra cost split per plan
- Platform plan margins are higher because customer's existing platform
  subscription covers the expensive provider API layer

https://claude.ai/code/session_01Lbmw1Cj23tw8LovrfFpv24
Completes the trail trajectory for pricing strategy including decisions
on event-based billing, tier price points, Nango cost problem resolution,
and platform-specific plans.

https://claude.ai/code/session_01Lbmw1Cj23tw8LovrfFpv24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants