feat: Add missing-key metrics for online feature lookups #359
feat: Add missing-key metrics for online feature lookups #359vanitabhagwat wants to merge 8 commits intomasterfrom
Conversation
Emit DogStatsD counters (feature_lookup_not_found, feature_lookup_null_or_expired) when online store reads return NOT_FOUND, NULL_VALUE, or OUTSIDE_MAX_AGE statuses. Gated behind ENABLE_MISSING_KEY_METRICS env var; pluggable StatsD client in Python via set_metrics_client(). Covers Go HTTP/gRPC servers and Python OnlineStore.get_online_features. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove unused Optional import and pytest import, consolidate protobuf imports to fix ruff linting errors. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Format code to match project style guidelines using ruff format. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Extract and add feature_view tag alongside feature tag for better
aggregation and filtering in Datadog dashboards.
Changes:
- Add _extract_feature_view() function in Python and extractFeatureView() in Go
- Parse feature_view from feature name (format: feature_view__feature_name)
- Add feature_view tag to both not_found and null_or_expired metrics
- Add comprehensive tests for extraction function and tag presence
- No cardinality increase - feature_view is derived from existing feature tag
Tags now include:
- project, online_store_type, service, env (existing)
- feature (existing, high cardinality)
- feature_view (new, low cardinality ~10-50 per project)
Enables Datadog queries like:
sum:feast.feature_server.feature_lookup_not_found by {feature_view}
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
||
| def _emit_missing_key_metrics(config, project, response_proto): | ||
| from feast._missing_key_metrics import LookupMetricsAggregator | ||
| from feast.metrics_client import get_metrics_client |
There was a problem hiding this comment.
the lazy imports of LookupMetricsAggregator and get_metrics_client inside this function run on every request when metrics are enabled. These could be moved to module-level imports to avoid the repeated importlib overhead I think
| } | ||
|
|
||
| if s.metricsClient != nil { | ||
| agg := metrics.NewLookupMetricsAggregator( |
There was a problem hiding this comment.
consider reusing a single aggregator instance on the server struct instead of allocating per-request (prevents potential for small latency increase per request)
There was a problem hiding this comment.
Are you suggesting we reuse the aggregator struct itself or talking about rebuilding metadata on every request?
| } | ||
|
|
||
| if s.metricsClient != nil { | ||
| agg := metrics.NewLookupMetricsAggregator( |
There was a problem hiding this comment.
| } | ||
|
|
||
| if s.metricsClient != nil { | ||
| agg := metrics.NewLookupMetricsAggregator( |
There was a problem hiding this comment.
| } | ||
|
|
||
| if s.metricsClient != nil { | ||
| agg := metrics.NewLookupMetricsAggregator( |
There was a problem hiding this comment.
|
|
||
| for feat, cnt in self.not_found.items(): | ||
| if cnt: | ||
| self.metrics_client.increment( |
There was a problem hiding this comment.
Each call to increment() sends a UDP packet. The overhead of sending UDP packets can be too great for some performance intensive code paths (online store reads). For requests with many distinct missing features, consider using a sample_rate parameter (e.g., sample_rate=0.5) to reduce the volume of StatsD packets. DogStatsD automatically corrects COUNT metrics by 1/sample_rate, so accuracy is preserved. See metric submission options.
| var metricsClient metrics.StatsdClient | ||
| if metrics.IsMissingKeyMetricsEnabled() { | ||
| if statsdHost, ok := os.LookupEnv("DD_AGENT_HOST"); ok { | ||
| client, clientErr := statsd.New(fmt.Sprintf("%s:8125", statsdHost)) |
There was a problem hiding this comment.
is 8125 always the statsd port or should that be configurable via env var (e.g., DD_DOGSTATSD_PORT)?
There was a problem hiding this comment.
I was following the existing pattern. I think we can make it configurable for flexibility.
| project, onlineStore, service, env string, | ||
| client StatsdClient, | ||
| ) *LookupMetricsAggregator { | ||
| if client == nil { |
There was a problem hiding this comment.
This returns nil when client is nil, but callers still nil-check the client (grpc_server.go and http_server.go) with if s.metricsClient != nil.
Do we need both?
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Let the DD agent's global tags (DD_ENV, DD_SERVICE) handle env and service dimensions, consistent with the existing heartbeat metric. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
What this PR does / why we need it:
Introducing feature‑level count metrics at the feature server level.
For each feature, we emit following metrics:
Aggregation scope = feature
Emission frequency = per request
Which issue(s) this PR fixes:
Misc