enhancement(datadog encoder): support for metrics v3 protocol#1175
enhancement(datadog encoder): support for metrics v3 protocol#1175
Conversation
Binary Size Analysis (Agent Data Plane)Target: bb1575e (baseline) vs a9f5109 (comparison) diff
|
| Module | File Size | Symbols |
|---|---|---|
saluki_components::encoders::datadog |
+69.19 KiB | 325 |
core |
+38.49 KiB | 8803 |
[sections] |
+10.27 KiB | 9 |
anyhow |
+9.12 KiB | 1256 |
hashbrown |
+7.45 KiB | 347 |
saluki_common::task::instrument |
+7.28 KiB | 76 |
protobuf |
+4.74 KiB | 12 |
saluki_components::common::datadog |
-4.20 KiB | 325 |
http |
+3.85 KiB | 323 |
agent_data_plane::cli::run |
+2.73 KiB | 70 |
saluki_io::compression::Compressor<W> |
+2.49 KiB | 3 |
saluki_io::net::util |
+2.24 KiB | 130 |
serde_core |
+2.10 KiB | 339 |
[Unmapped] |
+2.08 KiB | 1 |
uuid |
+1.85 KiB | 4 |
agent_data_plane::components::apm_onboarding |
-1.36 KiB | 34 |
saluki_components::transforms::dogstatsd_mapper |
-992 B | 18 |
saluki_components::forwarders::datadog |
+846 B | 22 |
tokio |
+715 B | 2137 |
tracing_core |
+582 B | 438 |
Detailed Symbol Changes
FILE SIZE VM SIZE
-------------- --------------
[NEW] +1.79Mi [NEW] +1.79Mi std::thread::local::LocalKey<T>::with::hc561bb4887a44f4d
+1.2% +155Ki +1.3% +132Ki [21844 Others]
[NEW] +117Ki [NEW] +117Ki agent_data_plane::cli::run::create_topology::_{{closure}}::h0c5e4b739c354b02
[NEW] +84.6Ki [NEW] +84.5Ki agent_data_plane::internal::control_plane::spawn_control_plane::_{{closure}}::h267c0964a8265cd5
[NEW] +67.3Ki [NEW] +67.1Ki saluki_components::common::datadog::io::run_endpoint_io_loop::_{{closure}}::h80e57416e92ba565
[NEW] +57.8Ki [NEW] +57.6Ki agent_data_plane::cli::run::handle_run_command::_{{closure}}::hf85552faa28dc230
[NEW] +49.5Ki [NEW] +49.4Ki saluki_app::bootstrap::AppBootstrapper::bootstrap::_{{closure}}::hf7945539f2e082c3
[NEW] +47.5Ki [NEW] +47.3Ki moka::sync::base_cache::Inner<K,V,S>::do_run_pending_tasks::h3d28faf021a45e28
[NEW] +46.4Ki [NEW] +46.3Ki h2::proto::connection::Connection<T,P,B>::poll::hf6f3caab9d5e2ca4
[NEW] +46.0Ki [NEW] +45.8Ki _<saluki_components::destinations::prometheus::Prometheus as saluki_core::components::destinations::Destination>::run::_{{closure}}::h68ed13bab47c1030
[NEW] +45.7Ki [NEW] +45.5Ki _<saluki_components::forwarders::otlp::OtlpForwarder as saluki_core::components::forwarders::Forwarder>::run::_{{closure}}::h8d4dd2e684a12115
[DEL] -46.0Ki [DEL] -45.8Ki _<saluki_components::destinations::prometheus::Prometheus as saluki_core::components::destinations::Destination>::run::_{{closure}}::hea5618e9afd08b63
[DEL] -46.1Ki [DEL] -45.9Ki _<saluki_components::forwarders::otlp::OtlpForwarder as saluki_core::components::forwarders::Forwarder>::run::_{{closure}}::hb0c627893c271e2c
[DEL] -46.4Ki [DEL] -46.3Ki h2::proto::connection::Connection<T,P,B>::poll::h100071ec98c95c21
[DEL] -47.5Ki [DEL] -47.3Ki moka::sync::base_cache::Inner<K,V,S>::do_run_pending_tasks::hfd5cba4fa7f4623d
[DEL] -49.5Ki [DEL] -49.4Ki saluki_app::bootstrap::AppBootstrapper::bootstrap::_{{closure}}::h15f926ef5659e580
[DEL] -57.8Ki [DEL] -57.7Ki agent_data_plane::cli::run::handle_run_command::_{{closure}}::hebabd4c0d26708e2
[DEL] -64.2Ki [DEL] -64.0Ki saluki_components::common::datadog::io::run_endpoint_io_loop::_{{closure}}::h1c7964dcd8ecbead
[DEL] -84.6Ki [DEL] -84.5Ki agent_data_plane::internal::control_plane::spawn_control_plane::_{{closure}}::h78c38079d35f4f77
[DEL] -114Ki [DEL] -114Ki agent_data_plane::cli::run::create_topology::_{{closure}}::h4a1b0b37b7e2d03e
[DEL] -1.79Mi [DEL] -1.79Mi std::thread::local::LocalKey<T>::with::he517c09e5477efe1
+0.6% +160Ki +0.6% +138Ki TOTAL
Regression Detector (Agent Data Plane)Regression Detector ResultsRun ID: aa6bf323-7cb3-429c-97b3-5b385bda7173 Baseline: bb1575e Optimization Goals: ✅ Improvement(s) detected
|
| perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
|---|---|---|---|---|---|---|
| ➖ | otlp_ingest_logs_5mb_memory | memory utilization | +4.51 | [+4.12, +4.91] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_logs_5mb_throughput | ingress throughput | -0.00 | [-0.13, +0.13] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_logs_5mb_cpu | % cpu utilization | -0.46 | [-5.66, +4.74] | 1 | (metrics) (profiles) (logs) |
Fine details of change detection per experiment
| perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
|---|---|---|---|---|---|---|
| ➖ | dsd_uds_512kb_3k_contexts_cpu | % cpu utilization | +7.70 | [-50.11, +65.52] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_logs_5mb_memory | memory utilization | +4.51 | [+4.12, +4.91] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_100mb_3k_contexts_cpu | % cpu utilization | +2.74 | [-2.38, +7.86] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_5mb_cpu | % cpu utilization | +1.65 | [-0.67, +3.98] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_1mb_3k_contexts_memory | memory utilization | +1.16 | [+0.98, +1.34] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_500mb_3k_contexts_memory | memory utilization | +1.04 | [+0.88, +1.21] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_5mb_memory | memory utilization | +1.03 | [+0.69, +1.37] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_100mb_3k_contexts_memory | memory utilization | +0.97 | [+0.78, +1.17] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_512kb_3k_contexts_memory | memory utilization | +0.93 | [+0.76, +1.10] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_1mb_3k_contexts_cpu | % cpu utilization | +0.92 | [-51.73, +53.58] | 1 | (metrics) (profiles) (logs) |
| ➖ | quality_gates_rss_idle | memory utilization | +0.87 | [+0.85, +0.90] | 1 | (metrics) (profiles) (logs) |
| ➖ | quality_gates_rss_dsd_low | memory utilization | +0.67 | [+0.48, +0.86] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_10mb_3k_contexts_memory | memory utilization | +0.65 | [+0.47, +0.84] | 1 | (metrics) (profiles) (logs) |
| ➖ | quality_gates_rss_dsd_medium | memory utilization | +0.62 | [+0.41, +0.82] | 1 | (metrics) (profiles) (logs) |
| ➖ | quality_gates_rss_dsd_heavy | memory utilization | +0.19 | [+0.05, +0.33] | 1 | (metrics) (profiles) (logs) |
| ➖ | quality_gates_rss_dsd_ultraheavy | memory utilization | +0.03 | [-0.10, +0.15] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_100mb_3k_contexts_throughput | ingress throughput | +0.01 | [-0.05, +0.07] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_5mb_throughput | ingress throughput | +0.00 | [-0.02, +0.02] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_5mb_throughput | ingress throughput | -0.00 | [-0.02, +0.02] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_logs_5mb_throughput | ingress throughput | -0.00 | [-0.13, +0.13] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_1mb_3k_contexts_throughput | ingress throughput | -0.00 | [-0.06, +0.06] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_512kb_3k_contexts_throughput | ingress throughput | -0.00 | [-0.06, +0.05] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_10mb_3k_contexts_throughput | ingress throughput | -0.00 | [-0.12, +0.11] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_metrics_5mb_throughput | ingress throughput | -0.01 | [-0.14, +0.12] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_5mb_memory | memory utilization | -0.30 | [-0.56, -0.04] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_500mb_3k_contexts_cpu | % cpu utilization | -0.44 | [-1.89, +1.01] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_logs_5mb_cpu | % cpu utilization | -0.46 | [-5.66, +4.74] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_500mb_3k_contexts_throughput | ingress throughput | -2.11 | [-2.22, -1.99] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_5mb_cpu | % cpu utilization | -2.35 | [-4.90, +0.20] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_metrics_5mb_cpu | % cpu utilization | -2.38 | [-10.23, +5.47] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_10mb_3k_contexts_cpu | % cpu utilization | -3.09 | [-31.96, +25.78] | 1 | (metrics) (profiles) (logs) |
| ✅ | otlp_ingest_metrics_5mb_memory | memory utilization | -6.54 | [-6.75, -6.33] | 1 | (metrics) (profiles) (logs) |
Bounds Checks: ✅ Passed
| perf | experiment | bounds_check_name | replicates_passed | observed_value | links |
|---|---|---|---|---|---|
| ✅ | quality_gates_rss_dsd_heavy | memory_usage | 10/10 | 111.45MiB ≤ 140MiB | (metrics) (profiles) (logs) |
| ✅ | quality_gates_rss_dsd_low | memory_usage | 10/10 | 33.80MiB ≤ 50MiB | (metrics) (profiles) (logs) |
| ✅ | quality_gates_rss_dsd_medium | memory_usage | 10/10 | 53.11MiB ≤ 75MiB | (metrics) (profiles) (logs) |
| ✅ | quality_gates_rss_dsd_ultraheavy | memory_usage | 10/10 | 161.88MiB ≤ 200MiB | (metrics) (profiles) (logs) |
| ✅ | quality_gates_rss_idle | memory_usage | 10/10 | 21.30MiB ≤ 40MiB | (metrics) (profiles) (logs) |
Explanation
Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%
Performance changes are noted in the perf column of each table:
- ✅ = significantly better comparison variant performance
- ❌ = significantly worse comparison variant performance
- ➖ = no significant change in performance
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
|
This is temporarily blocked on there being a version of the Datadog Agent for us to test against in correctness tests that has up-to-date v3 metrics support. Currently, we're hitting an issue related to rate intervals being delta encoded when they shouldn't be. That bug is fixed in DataDog/datadog-agent#45825 but won't be released until 7.77: roughly 2 weeks from now before an RC is available to use. We can potentially do a hacky image build or something for keep going in the meantime and then switch back to a proper Agent version once available, we'll see. |
79cdda1 to
59636cd
Compare
|
We've temporarily handled the issue of correctness tests by using a "dev" container image ( We can't merge this as-is: we need to wait for at least an RC build of Datadog Agent 7.77 so we can pin to a non-development image. In the meantime, I'm going to work on making sure we've integrated all of the same small fixes/changes that have been steadily being made upstream in the Datadog Agent repository for V3 support. |
30ee642 to
898021d
Compare
be9a81c to
a9f5109
Compare

Summary
Change Type
How did you test this PR?
References