Skip to content

feat: opt-in anonymous telemetry client#171

Merged
TerrifiedBug merged 16 commits intomainfrom
feat/telemetry-client
Apr 25, 2026
Merged

feat: opt-in anonymous telemetry client#171
TerrifiedBug merged 16 commits intomainfrom
feat/telemetry-client

Conversation

@TerrifiedBug
Copy link
Copy Markdown
Owner

Summary

Adds an opt-in anonymous telemetry client. When enabled, each VectorFlow instance sends one daily heartbeat to the centralised pulse.terrifiedbug.com receiver containing aggregate, non-PII counts (instance ID, version, agent count, pipeline count, auth method, deployment mode). Off by default; never sends without explicit consent.

Implements design at docs/superpowers/specs/2026-04-25-vf-telemetry-client-design.md (gitignored, in main worktree).

What's included

  • 3 new fields on SystemSettings: telemetryEnabled, telemetryInstanceId (ULID), telemetryEnabledAt
  • buildHeartbeatPayload pure function and sendTelemetryHeartbeat orchestrator (src/server/services/telemetry-payload.ts + telemetry-sender.ts)
  • Daily node-cron scheduler hooked into instrumentation.ts singleton startup (leader-elected, multi-replica safe)
  • tRPC telemetry router with get and update procedures, both requireSuperAdmin-gated; update is audit-logged
  • Setup wizard Step 3 with required-choice prompt (no default, can't be skipped)
  • Settings → Telemetry single-toggle page
  • Public docs at docs/public/operations/telemetry.md

What's NOT collected

Hostnames, IP addresses, pipeline names/configs/VRL, user identifiers, source/sink endpoints, or any data flowing through pipelines. Receiver derives country server-side from request IP and never stores the IP itself.

Behaviour notes

  • First-time enable (wizard "Yes" or Settings toggle off→on with no existing instanceId): generates ULID, sets enabledAt, fires immediate fire-and-forget heartbeat so the instance shows up on Pulse without waiting up to 24 hours.
  • Re-enable (toggle off→on with existing instanceId): preserves both instanceId and enabledAt so Pulse sees the same anonymous instance.
  • Disable: just sets enabled=false. Doesn't touch instanceId or enabledAt.
  • Failures (network, non-2xx, timeout): log + Sentry capture, no retry queue, next day's tick tries again.
  • 503 + Retry-After: honored once for the next call, then forgotten on process restart.

Schema migration

Adds three nullable/defaulted columns to SystemSettings. No destructive operations. Pre-existing schema drift (AlertMetric enum on WebhookDelivery/WebhookEndpoint, FK changes on AuditLog/Environment) was deliberately excluded from this migration — it predates this branch and should be addressed separately.

Test plan

  • CI green (pnpm test:run, pnpm build, pnpm lint)
  • Migration applies cleanly to a fresh dev DB: npx prisma migrate reset --force then npx prisma migrate deploy
  • Fresh-install setup wizard: navigate to /setup, complete Step 1 + 2, verify Step 3 has both buttons, "Complete setup" disabled until a button is clicked
  • Wizard "Yes" path: SystemSettings row has telemetryEnabled=true, 26-char telemetryInstanceId, telemetryEnabledAt set; immediate heartbeat arrives at Pulse within seconds
  • Wizard "No" path: SystemSettings row has telemetryEnabled=false, both other fields null; no heartbeat fires
  • Settings → Telemetry: toggle reflects current state, off→on generates a new ULID + fires heartbeat, off→on→off→on preserves the ULID
  • Daily cron registered: confirmed via instrumentation logs on server start (node-cron schedule registered, fires 42 3 * * *)
  • /admin/audit shows a telemetry.update entry on toggle changes
  • Public docs page renders correctly via GitBook sync (verify /docs/operations/telemetry resolves)

Out of scope

  • Playwright extension covering Step 3 (deferred — existing setup E2E will need updating in a follow-up)
  • Retroactive opt-in nudge for existing instances upgrading
  • "Test connection" button in Settings
  • Telemetry payload V2 with feature usage / install-source tracking

Pre-existing CI failures (not caused by this branch)

  • dlp-vrl-integration.test.ts — 4 tests fail because the locally installed Vector binary doesn't define repeat() in VRL
  • agent-token.test.ts — 1 test times out at 5000ms in a bcrypt-heavy negative path

These predate this branch and should be tracked separately. They will need to be resolved before this PR can merge.

@github-actions github-actions Bot added documentation Improvements or additions to documentation dependencies Pull requests that update a dependency file feature labels Apr 25, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 25, 2026

Greptile Summary

This PR adds an opt-in anonymous telemetry client to VectorFlow. It introduces three new SystemSettings fields (ULID instance ID, enabled flag, timestamp), a pure payload builder, an outbound sender with timeout and 503-backoff handling, a daily node-cron scheduler, a requireSuperAdmin-gated tRPC router with audit logging, a setup wizard Step 3, and a settings toggle page. The implementation follows all established project patterns and the payload is correctly limited to aggregate, non-PII counts.

Confidence Score: 4/5

Safe to merge — no bugs or security issues found; implementation is correct and well-tested.

No P0 or P1 findings. The feature is complete, follows all tRPC/audit/RBAC patterns, payloads are non-PII, and every code path has unit tests. Score is 4 rather than 5 because several pre-existing CI failures are called out in the PR description as blockers that must be resolved before merge.

No files require special attention — all changed files are clean.

Important Files Changed

Filename Overview
src/server/services/telemetry-sender.ts Core outbound sender: reads settings, gathers aggregate DB counts, POSTs to hardcoded Pulse URL with 10 s AbortController timeout; handles 503 Retry-After suppression and Sentry capture correctly.
src/server/routers/telemetry.ts tRPC router with get (query) and update (mutation); both gated with requireSuperAdmin(); update correctly uses withAudit and generates ULID only on first-enable path.
src/server/services/setup.ts Adds telemetryChoice parameter to completeSetup and wires buildTelemetryFields into the SystemSettings upsert inside the existing transaction.
src/app/api/setup/route.ts Validates telemetryChoice presence and allowed values before calling completeSetup; fires an immediate fire-and-forget heartbeat on 'yes' after the transaction commits.
src/app/(auth)/setup/page.tsx Adds Step 3 telemetry choice UI; submit button correctly disabled until a choice is made; handleSubmit made optional-event to support direct button invocation.
src/server/services/telemetry-scheduler.ts Idempotent daily cron at 03:42 UTC using node-cron; errors caught so cron task survives failures; test-only teardown helper exported.
src/server/services/telemetry-payload.ts Pure builder function for the V1 HeartbeatPayload; well-typed with no side effects.
src/app/(dashboard)/settings/telemetry/page.tsx Toggle page using standard useQuery/useMutation pattern with cache invalidation on success and error toast on failure.
prisma/schema.prisma Adds telemetryEnabled (Boolean, default false), telemetryInstanceId (String?), and telemetryEnabledAt (DateTime?) to SystemSettings; non-destructive, nullable columns.
src/instrumentation.ts Registers the telemetry cron scheduler in the existing singleton startup block; wrapped in try/catch consistent with other scheduler registrations.

Sequence Diagram

sequenceDiagram
    participant U as User (Browser)
    participant SW as Setup Wizard / Settings UI
    participant API as /api/setup POST
    participant TRPC as tRPC telemetry.update
    participant DB as PostgreSQL (SystemSettings)
    participant Cron as node-cron (03:42 UTC)
    participant Pulse as pulse.terrifiedbug.com

    Note over U,Pulse: First-time enable (Setup Wizard "Yes")
    U->>SW: Click "Yes, share anonymous stats" + Complete Setup
    SW->>API: POST /api/setup {telemetryChoice: "yes", ...}
    API->>DB: completeSetup() upsert — telemetryEnabled=true, ULID, enabledAt
    API-->>SW: {success: true}
    API--)Pulse: sendTelemetryHeartbeat() fire-and-forget

    Note over U,Pulse: Toggle in Settings → Telemetry
    U->>SW: Toggle on
    SW->>TRPC: telemetry.update {enabled: true}
    TRPC->>DB: findUnique SystemSettings
    DB-->>TRPC: {telemetryEnabled: false, instanceId: null, ...}
    Note right of TRPC: isFirstEnable=true → generate ULID
    TRPC->>DB: update {telemetryEnabled: true, instanceId: ULID, enabledAt: now}
    TRPC-->>SW: {ok: true}
    TRPC--)Pulse: sendTelemetryHeartbeat() fire-and-forget

    Note over U,Pulse: Daily cron tick
    Cron->>DB: findUnique SystemSettings
    DB-->>Cron: {telemetryEnabled: true, instanceId: ..., oidcIssuer: ...}
    Cron->>DB: pipeline.count (draft/active/paused) + vectorNode.count
    DB-->>Cron: counts
    Cron->>Pulse: POST /api/v1/ping {schema_version:1, instance_id, ...}
    Pulse-->>Cron: 204 No Content
Loading

Reviews (1): Last reviewed commit: "fix(telemetry): immediate heartbeat from..." | Re-trigger Greptile

Skip credit-card VRL fixtures when the installed Vector binary lacks
repeat() (added after 0.54). Uses a runtime capability probe so the
tests self-enable once CI is updated to a newer Vector release.

Bump bcrypt "rejects incorrect token" tests to 15 s timeout — the full
bcrypt comparison is intentionally slow and was flaking at 5 s on
loaded CI runners.
@TerrifiedBug TerrifiedBug merged commit 849f549 into main Apr 25, 2026
12 checks passed
@TerrifiedBug TerrifiedBug deleted the feat/telemetry-client branch April 25, 2026 20:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file documentation Improvements or additions to documentation feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant