Skip to content

[Azure Cosmos Benchmark]Remove CodaHale metrics from azure-cosmos-benchmark, use SDK Micromet…#48281

Merged
xinlian12 merged 44 commits intoAzure:mainfrom
xinlian12:removecodahalemetrics
Mar 9, 2026
Merged

[Azure Cosmos Benchmark]Remove CodaHale metrics from azure-cosmos-benchmark, use SDK Micromet…#48281
xinlian12 merged 44 commits intoAzure:mainfrom
xinlian12:removecodahalemetrics

Conversation

@xinlian12
Copy link
Member

@xinlian12 xinlian12 commented Mar 5, 2026

Remove CodaHale Metrics + Modernize Benchmark Architecture


1. SDK Metrics as Single Source of Truth

Removed all app-level metric tracking (MetricRegistry, Meter, Timer, HdrHistogram) from every benchmark class. Benchmark classes are now pure workload drivers. The SDK emits cosmos.client.op.calls, cosmos.client.op.latency, cosmos.client.op.RUs with full tag dimensions (Container, Operation, OperationStatusCode, ClientCorrelationId, RegionName, PartitionKeyRangeId, etc.).

2. Reactor Flux.flatMap Concurrency Control

Replaced hand-rolled Semaphore + BaseSubscriber + AtomicLong count + synchronized wait/notify with Flux.flatMap(concurrency). Subclasses return Mono<T> from performWorkload(long i). Uses Schedulers.parallel() (global shared singleton) — must NOT be disposed by the benchmark.

3. Reporter Architecture — Destination-Based

Reporting destinations configured via metrics.destination in workload JSON. Console logging (LoggingMeterRegistry) is always active alongside any destination for real-time visibility.

All reporters read from Micrometer SimpleMeterRegistry directly — no Dropwizard bridge. This eliminates the per-record() overhead of DropwizardBridgeMeterRegistry (which maintained a parallel set of Dropwizard meters backed by ExponentiallyDecayingReservoir, causing ~30% perf degradation on the hot path).

If multiple destinations are configured, a warning is logged and only the first match is used.

Destination Config Key Implementation
Console (always on) none Micrometer LoggingMeterRegistry (logs all meters via SLF4J)
CSV destination.csv CsvMetricsReporter — iterates SimpleMeterRegistry.getMeters() directly, writes per-metric CSV files (buffered per reporting cycle)
Cosmos DB destination.cosmos CosmosMetricsReporter — reads all Micrometer meters (including JVM/Netty) with explicit tag dimensions
App Insights destination.applicationInsights AzureMonitorMeterRegistry — SDK sends all meters+tags to Log Analytics/Kusto

4. Removed skipWarmUpOperations Support

The warmup mechanism (skipWarmUpOperations config) has been completely removed. It was tightly coupled to the Dropwizard MetricRegistry — it worked by resetting and re-registering Dropwizard meters after N initial operations. With the switch to Micrometer (where meters are managed by the SDK's ClientTelemetryMetrics, not the benchmark app), warmup metrics cannot be cleanly isolated from measurement data since counters are cumulative and the registry is shared.

5. Workload JSON Config

Console (default — no destination needed):

{
  "metrics": {
    "printingInterval": 10,
    "enableJvmStats": true
  },
  "tenantDefaults": { "serviceEndpoint": "https://...", "masterKey": "..." },
  "tenants": [{ "operation": "ReadLatency", "numberOfOperations": 100000 }]
}

CSV destination:

{
  "metrics": {
    "printingInterval": 10,
    "destination": {
      "csv": { "reportingDirectory": "/tmp/benchmark-output" }
    }
  },
  "tenantDefaults": { "serviceEndpoint": "https://...", "masterKey": "..." },
  "tenants": [{ "operation": "ReadLatency", "numberOfOperations": 100000 }]
}

Cosmos DB destination:

{
  "metrics": {
    "printingInterval": 10,
    "destination": {
      "cosmos": {
        "serviceEndpoint": "https://<results-account>.documents.azure.com:443/",
        "masterKey": "<results-account-key>",
        "database": "benchmarkResults",
        "container": "metrics",
        "testVariationName": "baseline-read-1k",
        "branchName": "main",
        "commitId": "abc1234"
      }
    }
  },
  "tenantDefaults": { "serviceEndpoint": "https://...", "masterKey": "..." },
  "tenants": [{ "operation": "ReadLatency", "numberOfOperations": 100000 }]
}

Application Insights destination:

{
  "metrics": {
    "printingInterval": 10,
    "destination": {
      "applicationInsights": {
        "connectionString": "InstrumentationKey=...;IngestionEndpoint=...",
        "stepSeconds": 10,
        "testCategory": "nightly-perf"
      }
    }
  },
  "tenantDefaults": { "serviceEndpoint": "https://...", "masterKey": "..." },
  "tenants": [{ "operation": "ReadLatency", "numberOfOperations": 100000 }]
}

Typed config classes with null validation: CsvReporterConfig, CosmosReporterConfig, AppInsightsReporterConfig.

6. Resource Lifecycle

  • Scheduler: Schedulers.parallel() (global shared, not disposed)
  • Meter registries: compositeRegistry.close(), appInsightsRegistry.close() in finally
  • Console: loggingRegistry.close() in finally
  • JVM binders: JvmGcMetrics etc. stored and .close()d in finally
  • ThreadPrefixGaugeSet: close() method shuts down internal executor
  • Netty metrics: compositeRegistry added to Metrics.globalRegistry when enabled, removed in finally

7. LinkedIn CTL Workload Metrics

Injected shared MeterRegistry into AsyncClientFactory.buildAsyncClient() so LICtlWorkload's Cosmos client participates in SDK telemetry reporting.

8. Files

New:

  • ReportingDestination.java
  • CsvMetricsReporter.java (reads Micrometer SimpleMeterRegistry directly, writes CSV)
  • CosmosMetricsReporter.java (Cosmos DB upload with full Micrometer tag dimensions)
  • CsvReporterConfig.java, CosmosReporterConfig.java, AppInsightsReporterConfig.java
  • ReporterConfigValidationTest.java (9 unit tests for config validation)
  • workload-config-sample.json

Deleted:

  • DropwizardBridgeMeterRegistry.java (no longer needed)
  • ScheduledReporterFactory.java, BenchmarkRequestSubscriber.java, NettyHttpMetricsReporter.java
  • linkedin/impl/metrics/MetricsFactory.java, MetricsImpl.java, Metrics.java

Dependencies:

  • Removed: metrics-core, metrics-jvm, metrics-graphite, hdrhistogram-metrics-reservoir
  • Kept: micrometer-core, micrometer-registry-azure-monitor, micrometer-registry-graphite

9. Benchmark Results — PR vs Main (Multi-Destination)

VM: Standard_D16s_v3 (16 vCPU, 64GB) | Workload: ReadThroughput, 30 tenants, GATEWAY mode, 10-min duration | Commit: PR 00fe591 vs Main 00a35f4

Console Destination (PR vs Main)

Metric PR Main Δ
Throughput 44.4M ops (70.3K ops/s) 43.7M ops (68.6K ops/s) ≡ equivalent
Latency (mean) 6.8 ms 6.9 ms
CPU 94.0% avg 94.3% avg
GC 565 collections, 18ms max 565 collections, 20ms max

CSV Destination (PR vs Main)

PR uses Micrometer CsvMetricsReporter → per-metric CSV files. Main uses CodaHale CsvReporter.

Metric PR (Micrometer) Main (CodaHale) Δ
Throughput 44.1M ops (71.7K ops/s) 41.4M ops (69.0K ops/s) ≡ equivalent
Latency (mean) 6.9 ms
GC 560 collections 554 collections

Cosmos DB Destination (PR only)

Main branch has a known NPE bug in CosmosTotalResultReporter that prevents uploads.

Metric PR
Throughput 50.7M ops (81.0K ops/s)
CPU 46.6% avg, 99.8% max
GC 142 collections, 17ms max

Application Insights Destination (PR vs Main)

Metric PR Main
Throughput 44.6M ops (73.9K ops/s) 41.1M ops (66.3K ops/s)
Data Points Ingested 68,934
Unique Metric Names 32

Cross-Destination Comparison (PR branch, same commit)

Metric Console CSV Cosmos DB App Insights
Total Ops 44.4M 44.1M 50.7M 44.6M
Ops/s 70.3K 71.7K 81.0K 73.9K
Peak Threads 262 277 291 288
Fixed pools identical?
Netty conns/endpoint 20 20 20 20

Conclusion: ✅ No regression detected. All fixed-size thread pools, netty connections, latency, and throughput are equivalent between PR and main across all 4 destinations.

…er metrics

Remove all CodaHale (Dropwizard) metrics instrumentation from the benchmark
project and rely on SDK-emitted Micrometer metrics (cosmos.client.op.*) as
the single source of truth.

Key changes:
- Remove MetricRegistry/Meter/Timer/HdrHistogram from all benchmark classes
- Benchmark classes become pure workload drivers (no app-level metric tracking)
- Create BenchmarkMetricsReporter that reads SDK meters from shared MeterRegistry
- Rewrite CosmosTotalResultReporter to read SDK meters instead of extending
  CodaHale ScheduledReporter
- Rewrite ThreadPrefixGaugeSet as Micrometer MeterBinder
- BenchmarkOrchestrator uses CompositeMeterRegistry (Simple + optional AzureMonitor)
- JVM metrics via Micrometer binders (JvmGcMetrics, JvmMemoryMetrics, JvmThreadMetrics)
- Remove dropwizard metrics-core/jvm/graphite and hdrhistogram-metrics-reservoir deps
- Delete ScheduledReporterFactory, MetricsFactory, MetricsImpl, Metrics interface
- Per-tenant breakdown via SDK ClientCorrelationId tag (set to tenant name/ID)

Net result: 37 files changed, -950/+579 lines (net -371 lines removed)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions github-actions bot added the Cosmos label Mar 5, 2026
Annie Liang and others added 3 commits March 5, 2026 19:16
Replace hand-rolled concurrency control (Semaphore + BaseSubscriber + AtomicLong
count + synchronized wait/notify) with Reactor's native Flux.flatMap(concurrency).

Key changes:
- AsyncBenchmark.run(): Flux.range/generate + flatMap(concurrency).blockLast()
- performWorkload signature: void performWorkload(BaseSubscriber, long) -> Mono<T> performWorkload(long)
- Add dedicated Scheduler (Schedulers.newParallel("cosmos-bench")) to avoid
  contention with global Schedulers.parallel(), mirroring SDK's CosmosSchedulers pattern
- Delete LatencySubscriber inner classes from 6 subclasses (broken after CodaHale removal)
- Delete BenchmarkRequestSubscriber.java (no longer needed)
- Subclasses simplified to just build and return Mono (no subscriber wiring)
- AsyncCtlWorkload and AsyncEncryptionBenchmark refactored with same pattern
- SyncBenchmark unchanged (uses CompletableFuture, not reactive)

Net: 15 files changed, -515 lines

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Match CosmosSchedulers pattern: use Schedulers.newBoundedElastic with
DEFAULT_BOUNDED_ELASTIC_SIZE, DEFAULT_BOUNDED_ELASTIC_QUEUESIZE, 60s TTL.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Changed .next() to .last() for query Flux-to-Mono conversion so all pages
are consumed (not just the first page). .last() subscribes to the entire
Flux and returns the final element.

Affected files:
- AsyncCtlWorkload: .next() -> .last() for query operation
- AsyncEncryptionQueryBenchmark: .next() -> .last() for all 5 query types
- AsyncEncryptionQuerySinglePartitionMultiple: .next() -> .last()

Also fixed indentation in AsyncCtlWorkload.run() flatMap chain.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Use Micrometer's built-in DropwizardMeterRegistry (in micrometer-core) to bridge
SDK-emitted Micrometer meters to a Dropwizard MetricRegistry, then use the
proven CsvReporter/ConsoleReporter for output.

This replaces ~260 lines of custom reporter code with ~90 lines of bridge config.
The DropwizardMeterRegistry is part of micrometer-core — no extra dependency needed.
metrics-core is re-added solely as the reporting sink (not for app-level instrumentation).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Annie Liang and others added 4 commits March 5, 2026 22:39
- AsyncEncryptionWriteBenchmark: fix generic type to avoid raw cast
- CosmosTotalResultReporter: validate status codes by range [200,300)
  instead of loose startsWith("2")
- ThreadPrefixGaugeSet: make refresh interval configurable, default to
  printing interval from config
- BenchmarkOrchestrator: clarify DropwizardMeterRegistry bridge comment,
  clarify Netty global registry comment
- external_dependencies.txt: remove unused metrics-graphite, metrics-jvm,
  hdrhistogram-metrics-reservoir entries

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When both Netty HTTP metrics and App Insights are enabled, add the
AzureMonitorMeterRegistry to Metrics.globalRegistry so Reactor Netty
connection pool gauges flow to Application Insights in addition to
the CSV reporter.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Instead of creating a separate SimpleMeterRegistry for Netty pool gauges,
add the compositeRegistry directly to Metrics.globalRegistry. This means
Netty connection pool metrics automatically flow through to:
- Dropwizard CSV/Console reporter (via the DropwizardMeterRegistry bridge)
- Application Insights (if AzureMonitorMeterRegistry is configured)
- NettyHttpMetricsReporter CSV (for dedicated netty-pool-metrics.csv)

Removes the unnecessary SimpleMeterRegistry and its import.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Separate concerns:
- DropwizardBridgeMeterRegistry: a MeterRegistry that bridges Micrometer
  meters to a Dropwizard MetricRegistry (added to composite)
- BenchmarkMetricsReporter: a plain reporter that reads from the bridge's
  Dropwizard MetricRegistry and outputs via CsvReporter/ConsoleReporter

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Annie Liang and others added 5 commits March 6, 2026 09:47
1. BenchmarkMetricsReporter: accept reportingDirectory string instead
   of pre-built Path; create metrics/ subdirectory internally
2. NettyHttpMetricsReporter: use dropwizardBridge instead of
   compositeRegistry to avoid tricky percentile behavior
3. CosmosTotalResultReporter: use dropwizardBridge; add comment
   clarifying intentional separation from BenchmarkMetricsReporter

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Use Dropwizard's native Meter.getOneMinuteRate() for success/failure rates
and Timer.getSnapshot().getMedian()/get99thPercentile() for latency,
instead of manual delta tracking and rolling sample arrays.

Removed:
- 5 List<Double> sample arrays and percentile75() helper
- Manual counter delta computation and rate calculation
- Micrometer Timer/Counter/HistogramSnapshot imports

Now accepts DropwizardBridgeMeterRegistry directly and reads from its
underlying Dropwizard MetricRegistry for native rate/percentile support.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…o-ready

1. Self-contained: Cosmos client creation and config validation moved
   inside the reporter. No-op if no upload endpoint is configured.
   Orchestrator just calls create() + start() + stop().

2. Uploads ALL metrics (not just op.calls/latency): Timers, Meters,
   Gauges, Counters — including Netty pool metrics when enabled.

3. Kusto-friendly schema: one document per meter per interval with
   MetricName (preserving all tag dimensions in hierarchical name),
   MetricType, Count, rates, percentiles, plus run metadata
   (Operation, TestVariation, Branch, Commit, Concurrency, CPU).

4. Removed builder pattern (unnecessary for single factory method).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
New architecture:
- Console summary (ConsoleSummaryReporter): always on, logs success/failure
  rates, p50/p99 latency, CPU every interval
- reportingDestination enum: CONSOLE (default), CSV, COSMOSDB, APPLICATION_INSIGHTS
  (mutually exclusive)
- CSV: per-metric CSV files via Dropwizard CsvReporter
- COSMOSDB: uploads all metrics with full Micrometer tag dimensions
  (Container, Operation, OperationStatusCode, ClientCorrelationId,
  RegionName, PartitionKeyRangeId, etc.) to Cosmos DB
- APPLICATION_INSIGHTS: AzureMonitorMeterRegistry sends all SDK meters
  natively to Log Analytics/Kusto

Config changes:
- New field: metrics.reportingDestination
- New section: metrics.applicationInsights (connectionString, instrumentationKey,
  stepSeconds, testCategory)
- Backward compatible: reportingDirectory without explicit destination defaults to CSV
- App Insights config from JSON, with fallback to system properties/env vars

Files:
- New: ReportingDestination.java, ConsoleSummaryReporter.java
- Simplified: BenchmarkMetricsReporter.java (CSV-only)
- Redesigned: CosmosTotalResultReporter.java (reads Micrometer for explicit tag dims)
- Updated: BenchmarkConfig.java, BenchmarkOrchestrator.java

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- ReportingDestination: removed CONSOLE value (console is always on, not a choice)
- reportingDestination field is now nullable (null = console-only)
- Removed backward compat: no system property/env var fallback for App Insights
- Config is clean: reportingDestination is explicit in JSON or omitted

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 6, 2026 22:02
@xinlian12 xinlian12 requested a review from a team as a code owner March 6, 2026 22:02
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the azure-cosmos-benchmark tool to remove app-level CodaHale/Dropwizard metric tracking from benchmark workloads and instead rely on Micrometer + Cosmos SDK-emitted meters as the primary metrics source, with a new destination-based reporting model and simplified concurrency control.

Changes:

  • Replaced per-benchmark metric tracking and custom subscribers/semaphore concurrency with Flux.flatMap(concurrency) patterns and scheduler injection.
  • Introduced destination-based metrics reporting (console/CSV via Dropwizard bridge, Cosmos DB uploader, Azure Monitor/App Insights registry).
  • Updated benchmark orchestration/config parsing and removed legacy metrics/reporting infrastructure and dependencies.

Reviewed changes

Copilot reviewed 48 out of 48 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
sdk/cosmos/azure-cosmos-benchmark/workload-config-sample.json Adds sample JSON illustrating new metrics.destination configuration.
sdk/cosmos/azure-cosmos-benchmark/src/test/java/com/azure/cosmos/benchmark/WorkflowTest.java Updates tests to construct workloads with a Reactor Scheduler instead of MetricRegistry.
sdk/cosmos/azure-cosmos-benchmark/src/test/java/com/azure/cosmos/benchmark/ReadMyWritesConsistencyTest.java Updates consistency test to use Scheduler injection instead of MetricRegistry.
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/linkedin/impl/metrics/MetricsImpl.java Deletes legacy LinkedIn CTL metrics implementation (CodaHale-based).
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/linkedin/impl/metrics/MetricsFactory.java Deletes legacy metrics factory used by LinkedIn CTL code.
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/linkedin/impl/QueryExecutor.java Removes per-operation metrics tracking hooks from query execution path.
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/linkedin/impl/Metrics.java Deletes legacy metrics interface.
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/linkedin/impl/GetExecutor.java Removes per-operation metrics tracking hooks from get execution path.
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/linkedin/impl/CosmosDBDataAccessor.java Updates constructors/signatures after removing MetricsFactory dependency.
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/linkedin/TestRunner.java Removes shared Dropwizard MetricRegistry dependency from CTL runners.
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/linkedin/QueryTestRunner.java Removes MetricRegistry plumbing after metrics removal.
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/linkedin/LICtlWorkload.java Removes MetricRegistry from workload construction and runner creation.
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/linkedin/GetTestRunner.java Removes MetricRegistry from runner construction.
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/linkedin/CompositeReadTestRunner.java Removes MetricRegistry from runner construction.
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/encryption/AsyncEncryptionWriteBenchmark.java Converts workload execution to return Mono and removes subscriber/semaphore timing logic.
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/encryption/AsyncEncryptionReadBenchmark.java Converts workload execution to return Mono and removes subscriber/semaphore timing logic.
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/encryption/AsyncEncryptionQuerySinglePartitionMultiple.java Converts workload execution to return Mono and removes subscriber/semaphore logic.
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/encryption/AsyncEncryptionQueryBenchmark.java Converts workload execution to return Mono and removes custom latency subscriber logic.
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/encryption/AsyncEncryptionBenchmark.java Reworks core async runner to use Flux.generate + flatMap(concurrency) with injected scheduler and removes CodaHale meters.
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/ctl/AsyncCtlWorkload.java Reworks CTL workload runner to return Mono operations and use flatMap(concurrency) with injected scheduler.
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/ThreadPrefixGaugeSet.java Replaces Dropwizard MetricSet with Micrometer MeterBinder for thread-prefix gauges.
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/TenantWorkloadConfig.java Removes legacy meter name constants tied to old Dropwizard metrics.
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/SyncWriteBenchmark.java Removes Dropwizard MetricRegistry dependency from sync benchmarks.
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/SyncReadBenchmark.java Removes Dropwizard MetricRegistry dependency from sync benchmarks.
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/SyncBenchmark.java Removes Dropwizard meters/timers from sync benchmark base.
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/ScheduledReporterFactory.java Deletes legacy factory that chose ConsoleReporter vs CsvReporter from a MetricRegistry.
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/ReportingDestination.java Adds enum for metrics destination selection.
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/ReadMyWriteWorkflow.java Converts workflow execution to return Mono and removes semaphore/subscriber-based dispatch.
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/NettyHttpMetricsReporter.java Deletes custom CSV netty pool metrics sampler reporter.
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/DropwizardBridgeMeterRegistry.java Adds Micrometer->Dropwizard bridge registry for Dropwizard reporters.
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/CsvReporterConfig.java Adds typed config for CSV destination.
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/CsvMetricsReporter.java Adds Dropwizard Console/CSV reporter wrapper reading from the bridge registry.
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/CosmosTotalResultReporter.java Deletes legacy Dropwizard ScheduledReporter that uploaded aggregate results to Cosmos.
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/CosmosReporterConfig.java Adds typed config for Cosmos DB destination.
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/CosmosMetricsReporter.java Adds Micrometer meter uploader to Cosmos DB preserving tag dimensions.
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/BenchmarkRequestSubscriber.java Deletes legacy subscriber used for semaphore-based concurrency + latency timers.
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/BenchmarkOrchestrator.java Reworks orchestrator to build composite registries, handle destinations, inject scheduler, and manage lifecycle cleanup.
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/BenchmarkConfig.java Reworks config parsing to support destination-specific reporter configs under metrics.destination.
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/AsyncWriteBenchmark.java Converts async write workload to return Mono and removes custom subscriber-based timing logic.
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/AsyncReadManyBenchmark.java Converts readMany workload to return Mono and removes subscriber-based timing logic.
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/AsyncReadBenchmark.java Converts async read workload to return Mono and removes subscriber-based timing logic.
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/AsyncQuerySinglePartitionMultiple.java Converts query workload to return Mono and removes subscriber-based timing logic.
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/AsyncQueryBenchmark.java Converts async query workload to return Mono and removes custom latency subscriber logic.
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/AsyncMixedBenchmark.java Converts mixed workload to return Mono and removes subscriber/semaphore dispatch.
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/AsyncBenchmark.java Reworks core async runner to Flux.generate + flatMap(concurrency) and injected scheduler; removes Dropwizard tracking.
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/AppInsightsReporterConfig.java Adds typed config for App Insights/Azure Monitor destination.
sdk/cosmos/azure-cosmos-benchmark/pom.xml Removes legacy Dropwizard metrics modules/reservoir dependency; keeps metrics-core and Micrometer registries.
eng/versioning/external_dependencies.txt Removes dropped third-party dependencies from tracked list.

You can also share your feedback on Copilot code review. Take the survey.

Annie Liang and others added 10 commits March 6, 2026 14:16
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Use Schedulers.newBoundedElastic matching CosmosSchedulers pattern.
BoundedElastic has 60s TTL so idle threads are reclaimed automatically,
unlike newParallel which keeps threads alive indefinitely.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
newBoundedElastic has overhead from thread TTL eviction and queuing that
is unnecessary for non-blocking benchmark workload dispatch. newParallel
is a fixed-size, zero-overhead scheduler. Thread cleanup handled by
dispose() which is already called in the finally block.

daemon=true ensures threads don't prevent JVM exit.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Use ScheduledThreadPoolExecutor with allowCoreThreadTimeOut(true) and
60s keepAliveTime. This gives parallel-like performance (fixed pool,
no queuing overhead) while idle threads are reclaimed after 60s.

Wrapped with Schedulers.fromExecutorService() for Reactor integration.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…vior

Use Reactor's built-in global parallel scheduler. Same scheduler the
old benchmark code used via subscribeOn(Schedulers.parallel()). No
custom scheduler creation or disposal needed.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… bridge hop

Console (default): now uses Micrometer's native LoggingMeterRegistry
which logs all meters directly via SLF4J. No Dropwizard bridge needed.
Eliminates one hop (Micrometer -> Dropwizard -> ConsoleReporter).

CSV: still uses DropwizardBridgeMeterRegistry + CsvReporter (Dropwizard's
CsvReporter has no native Micrometer equivalent for per-metric files).
Bridge is only created when CSV destination is selected.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1. Fix 409 conflict: tag 'id' was overwriting document 'id' (UUID).
   Guard against tag keys that collide with document metadata.
2. Only upload cosmos.client.* metrics — skip jvm.*, system.*, netty.*
   which are diagnostic noise for the result database.
3. Skip zero-value gauges (e.g., jvm.buffer.memory.used = 0).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
CompositeMeterRegistry with no children silently drops all meters.
COSMOSDB case needs a backing registry so SDK-emitted meters are
stored and queryable when CosmosMetricsReporter iterates them.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The reporter was iterating compositeRegistry.getMeters() which only
returns meters registered directly on that instance. The SDK's
ClientTelemetryMetrics registers meters on its own internal composite
and propagates data down into child registries, so the reporter must
read from the SimpleMeterRegistry where meter values actually land.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
CsvMetricsReporter now reads directly from Micrometer SimpleMeterRegistry
instead of bridging through DropwizardBridgeMeterRegistry. This eliminates
per-record() overhead on the hot path (ExponentiallyDecayingReservoir ops)
that caused ~30% perf degradation in benchmarks.

- Rewrite CsvMetricsReporter to iterate MeterRegistry.getMeters() directly
- Use SimpleMeterRegistry in BenchmarkOrchestrator CSV case
- Delete DropwizardBridgeMeterRegistry (no longer needed)
- Remove io.dropwizard.metrics:metrics-core dependency

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@xinlian12 xinlian12 force-pushed the removecodahalemetrics branch from fd77f66 to 14f037e Compare March 8, 2026 07:03
Annie Liang and others added 2 commits March 7, 2026 23:27
The warmup mechanism (skipWarmUpOperations) was tightly coupled to the
Dropwizard MetricRegistry — it reset and re-registered Dropwizard meters
after N warmup operations. With the switch to Micrometer (where meters
are managed by the SDK's ClientTelemetryMetrics), there is no equivalent
remove-and-re-register mechanism, so the warmup logic no longer functions.

Remove the remaining dead config field, getter, parser case, and sparsity
skip logic from TenantWorkloadConfig, AsyncBenchmark, and
AsyncEncryptionBenchmark.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Allow JVM, Netty, and all other metrics to be uploaded to Cosmos DB,
not just cosmos.client.* metrics. This makes JVM stats and Netty HTTP
pool metrics visible when using the Cosmos DB reporting destination.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Member Author

@xinlian12 xinlian12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Deep Review — Remove CodaHale Metrics + Modernize Benchmark Architecture

Overall: Excellent modernization — net -1015 lines, dramatically simplified benchmark subclasses, and idiomatic Reactor concurrency control. The direction is right. Below are 13 findings (2 blocking, 6 recommendations, 4 suggestions, 1 observation) focused on correctness gaps, resource leaks, and cross-SDK insights from the .NET benchmark tool.

Cross-SDK: The .NET Cosmos benchmark (Azure/azure-cosmos-dotnet-v3) retains warmup support, supports coexistent reporting destinations (console + CSV/Cosmos simultaneously), and maintains app-level metrics alongside SDK metrics — all areas where this PR diverges.


⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

Annie Liang and others added 4 commits March 9, 2026 10:43
- Fix misleading scheduler comment (not dedicated, uses Schedulers.parallel())
- Close appInsightsRegistry in finally block (resource leak)
- Fix DistributionSummary field names: remove incorrect 'Ms' suffix
- Inject shared MeterRegistry into LICtlWorkload via AsyncClientFactory
- Warn when multiple reporting destinations are configured
- Add null validation for CsvReporterConfig and CosmosReporterConfig
- Restore skipWarmUpOperations with new approach: run warmup ops before
  measured run via Flux.take().blockLast() (no Dropwizard dependency)
- Buffer CSV file writes: collect all rows per report() cycle, flush once
  per file instead of opening/closing per row
- Fix ReportingDestination CSV javadoc (was referencing Dropwizard)
- Remove unused AtomicReference import from ThreadPrefixGaugeSet

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Tests for CsvReporterConfig and CosmosReporterConfig null/empty field
validation, verifying IllegalArgumentException is thrown with clear
messages for missing required fields and optional fields default correctly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
LoggingMeterRegistry is now always added to the CompositeMeterRegistry,
providing real-time console visibility regardless of whether CSV, Cosmos
DB, or Application Insights destination is configured.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Warmup was tightly coupled to Dropwizard MetricRegistry and the new
Micrometer-based approach cannot cleanly isolate warmup metrics from
measurement data. Removed all warmup config, logic, and Benchmark.warmup()
method.

Also always enables LoggingMeterRegistry alongside any reporting
destination for real-time console visibility.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Member

@FabianMeiswinkel FabianMeiswinkel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it! LGTM - Thx!

@xinlian12 xinlian12 merged commit e5eece2 into Azure:main Mar 9, 2026
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants