Skip to content

feat: expose Prometheus /metrics endpoint for usage dashboards#102

Draft
dobby-coder[bot] wants to merge 2 commits intomainfrom
feat/prometheus-metrics-endpoint
Draft

feat: expose Prometheus /metrics endpoint for usage dashboards#102
dobby-coder[bot] wants to merge 2 commits intomainfrom
feat/prometheus-metrics-endpoint

Conversation

@dobby-coder
Copy link
Copy Markdown
Contributor

@dobby-coder dobby-coder Bot commented Apr 21, 2026

Summary

Implements the server-side half of #101: Cryptify now exposes a Prometheus text-format GET /metrics endpoint that Grafana on Scaleway can scrape to render the usage dashboards the issue asks for.

Metrics:

Metric Type Description
cryptify_uploads_total{channel} counter Finalized uploads per source channel
cryptify_upload_bytes_total{channel} counter Total bytes uploaded per channel
cryptify_storage_bytes gauge Current disk usage of data_dir
cryptify_active_files gauge Current file count in data_dir
cryptify_expired_files_total counter Uploads purged before finalization

Channel detection

channel is derived from request headers in this priority order:

  1. X-Cryptify-Source (explicit header, sanitized)
  2. Authorization: Bearer … / X-Api-Keyapi
  3. Originstaging-website / website
  4. User-Agent substring → outlook / thunderbird
  5. fallback unknown

All label values are lower-cased, restricted to [a-z0-9_-], and capped at 32 chars to prevent label-injection or cardinality explosions.

Storage gauges

A background task walks data_dir every metrics_scan_interval_secs (new config option, default 60) and updates the two gauges. This avoids touching the upload hot path.

Dashboard

docs/grafana/postguard-usage.json is a ready-to-import dashboard covering:

  • Messages sent per channel (rate + totals)
  • Bytes uploaded per channel
  • Storage usage (staging vs. production via Prometheus environment label)
  • Active file count (staging vs. production)
  • Expired uploads

docs/grafana/README.md contains a reference Prometheus scrape config.

Why draft

  1. /metrics is unauthenticated. The README and Grafana docs call out that access must be restricted at the firewall / reverse proxy. Confirm that matches the Scaleway / Procolix network policy before merging.
  2. The Outlook and Thunderbird addons don't currently send X-Cryptify-Source. Until follow-up PRs land in those repos they fall back to the User-Agent rule, which is approximate. Filing follow-ups as separate issues.
  3. I could not end-to-end test a full upload locally — the cryptify binary requires a reachable PKG server at startup (documented gotcha). Unit tests cover the metrics module comprehensively (13 tests, all green) but an integration run on a real staging deploy is worth doing before promoting out of draft.
  4. Coordinates with Enforce server-side upload limits: 5 GB rolling per email (14 days) #100 which is being implemented in parallel — deliberately no overlap (this PR stays out of sender-email tracking).

Test plan

  • cargo check — clean
  • cargo test — 13 new tests in metrics::tests, all pass
  • cargo clippy --all-targets — no new warnings (one pre-existing warning in src/email.rs:225, left alone)
  • Dashboard JSON validates as JSON
  • Deploy to staging, scrape /metrics with Prometheus, import docs/grafana/postguard-usage.json, confirm counters increment when uploading via postguard.eu / staging / addons
  • Confirm network policy restricts /metrics to the Prometheus segment

Follow-ups (not in this PR)

  • Addon PRs to set X-Cryptify-Source header (outlook / thunderbird)
  • Consider whether cryptify_upload_bytes_total should be emitted from upload_chunk instead of finalize (trade-off: finer-grained vs. includes abandoned uploads)

Refs #101

Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com

Adds a Prometheus text-format `GET /metrics` endpoint covering the metrics
requested in #101:

  * cryptify_uploads_total{channel}
  * cryptify_upload_bytes_total{channel}
  * cryptify_storage_bytes (gauge, sampled periodically from data_dir)
  * cryptify_active_files (gauge, same source)
  * cryptify_expired_files_total (counter, purged-before-finalized)

The channel label is derived from request headers:
  1. X-Cryptify-Source explicit header
  2. Authorization: Bearer / X-Api-Key -> "api"
  3. Origin -> "staging-website" / "website"
  4. User-Agent substring -> "outlook" / "thunderbird"
  5. fallback "unknown"

Values are sanitized (lower-case [a-z0-9_-], max 32 chars) to prevent
label-injection and cardinality blowup.

Storage gauges are sampled by a background task that walks data_dir every
`metrics_scan_interval_secs` (default 60, configurable). Dashboard JSON
ready for import into the Scaleway Grafana instance is shipped under
`docs/grafana/`, alongside a Prometheus scrape-config example.

No authentication on /metrics; restrict via firewall / proxy allow-list
(documented in README and docs/grafana/README.md).

Refs #101
@rubenhensen
Copy link
Copy Markdown
Contributor

/dobby fix conflicts

@dobby-coder
Copy link
Copy Markdown
Contributor Author

dobby-coder Bot commented Apr 25, 2026

Dobby has seen the request! Dobby will fix the conflicts right away, sir! Dobby is very good at resolving merge conflicts — Dobby will have this PR clean and ready in no time! 🧹

Integrate main-branch additions (chunk_size config, per-upload/rolling
rate limits, API key detection, /usage endpoint) alongside the metrics
feature from this branch. Both feature sets are preserved and work
together.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dobby-coder
Copy link
Copy Markdown
Contributor Author

dobby-coder Bot commented Apr 25, 2026

Conflicts resolved and pushed. Here's what happened:

Conflicting files: src/config.rs, src/main.rs, src/store.rs

Root cause: Main merged chunk-size limits, per-upload/rolling rate limiting, API key detection, and a /usage endpoint (from PR #100) while this branch added Prometheus metrics. Both sides touched the same structs and functions.

Resolution: Kept both feature sets — all metrics fields/logic alongside the new rate-limiting, chunk-size, and API key features. Specifically:

  • config.rs: Added both metrics_scan_interval_secs and chunk_size fields
  • store.rs: FileState now has both source_channel (metrics) and is_api_key (rate limits); kept UploadRecord and usage tracking from main
  • main.rs: Merged imports, kept both /metrics endpoint and ApiKeyPresent guard, upload_init stores both source_channel and is_api_key, upload_finalize records both metrics and usage, routes include both metrics_endpoint and usage

Verification:

  • cargo check — clean
  • cargo test — all 43 tests pass (13 metrics + 5 store/usage + 25 existing)
  • cargo clippy — no new warnings (one pre-existing in email.rs:253)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant