[Azure Cosmos Benchmark]Remove CodaHale metrics from azure-cosmos-benchmark, use SDK Micromet…#48281
Conversation
…er metrics Remove all CodaHale (Dropwizard) metrics instrumentation from the benchmark project and rely on SDK-emitted Micrometer metrics (cosmos.client.op.*) as the single source of truth. Key changes: - Remove MetricRegistry/Meter/Timer/HdrHistogram from all benchmark classes - Benchmark classes become pure workload drivers (no app-level metric tracking) - Create BenchmarkMetricsReporter that reads SDK meters from shared MeterRegistry - Rewrite CosmosTotalResultReporter to read SDK meters instead of extending CodaHale ScheduledReporter - Rewrite ThreadPrefixGaugeSet as Micrometer MeterBinder - BenchmarkOrchestrator uses CompositeMeterRegistry (Simple + optional AzureMonitor) - JVM metrics via Micrometer binders (JvmGcMetrics, JvmMemoryMetrics, JvmThreadMetrics) - Remove dropwizard metrics-core/jvm/graphite and hdrhistogram-metrics-reservoir deps - Delete ScheduledReporterFactory, MetricsFactory, MetricsImpl, Metrics interface - Per-tenant breakdown via SDK ClientCorrelationId tag (set to tenant name/ID) Net result: 37 files changed, -950/+579 lines (net -371 lines removed) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace hand-rolled concurrency control (Semaphore + BaseSubscriber + AtomicLong
count + synchronized wait/notify) with Reactor's native Flux.flatMap(concurrency).
Key changes:
- AsyncBenchmark.run(): Flux.range/generate + flatMap(concurrency).blockLast()
- performWorkload signature: void performWorkload(BaseSubscriber, long) -> Mono<T> performWorkload(long)
- Add dedicated Scheduler (Schedulers.newParallel("cosmos-bench")) to avoid
contention with global Schedulers.parallel(), mirroring SDK's CosmosSchedulers pattern
- Delete LatencySubscriber inner classes from 6 subclasses (broken after CodaHale removal)
- Delete BenchmarkRequestSubscriber.java (no longer needed)
- Subclasses simplified to just build and return Mono (no subscriber wiring)
- AsyncCtlWorkload and AsyncEncryptionBenchmark refactored with same pattern
- SyncBenchmark unchanged (uses CompletableFuture, not reactive)
Net: 15 files changed, -515 lines
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Match CosmosSchedulers pattern: use Schedulers.newBoundedElastic with DEFAULT_BOUNDED_ELASTIC_SIZE, DEFAULT_BOUNDED_ELASTIC_QUEUESIZE, 60s TTL. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
...os/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/ctl/AsyncCtlWorkload.java
Outdated
Show resolved
Hide resolved
...os/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/ctl/AsyncCtlWorkload.java
Outdated
Show resolved
Hide resolved
...hmark/src/main/java/com/azure/cosmos/benchmark/encryption/AsyncEncryptionQueryBenchmark.java
Outdated
Show resolved
Hide resolved
Changed .next() to .last() for query Flux-to-Mono conversion so all pages are consumed (not just the first page). .last() subscribes to the entire Flux and returns the final element. Affected files: - AsyncCtlWorkload: .next() -> .last() for query operation - AsyncEncryptionQueryBenchmark: .next() -> .last() for all 5 query types - AsyncEncryptionQuerySinglePartitionMultiple: .next() -> .last() Also fixed indentation in AsyncCtlWorkload.run() flatMap chain. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
...hmark/src/main/java/com/azure/cosmos/benchmark/encryption/AsyncEncryptionWriteBenchmark.java
Outdated
Show resolved
Hide resolved
...zure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/BenchmarkMetricsReporter.java
Outdated
Show resolved
Hide resolved
...s/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/BenchmarkOrchestrator.java
Outdated
Show resolved
Hide resolved
...s/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/BenchmarkOrchestrator.java
Outdated
Show resolved
Hide resolved
...ure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/CosmosTotalResultReporter.java
Outdated
Show resolved
Hide resolved
...os/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/ThreadPrefixGaugeSet.java
Outdated
Show resolved
Hide resolved
Use Micrometer's built-in DropwizardMeterRegistry (in micrometer-core) to bridge SDK-emitted Micrometer meters to a Dropwizard MetricRegistry, then use the proven CsvReporter/ConsoleReporter for output. This replaces ~260 lines of custom reporter code with ~90 lines of bridge config. The DropwizardMeterRegistry is part of micrometer-core — no extra dependency needed. metrics-core is re-added solely as the reporting sink (not for app-level instrumentation). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
...s/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/BenchmarkOrchestrator.java
Outdated
Show resolved
Hide resolved
- AsyncEncryptionWriteBenchmark: fix generic type to avoid raw cast
- CosmosTotalResultReporter: validate status codes by range [200,300)
instead of loose startsWith("2")
- ThreadPrefixGaugeSet: make refresh interval configurable, default to
printing interval from config
- BenchmarkOrchestrator: clarify DropwizardMeterRegistry bridge comment,
clarify Netty global registry comment
- external_dependencies.txt: remove unused metrics-graphite, metrics-jvm,
hdrhistogram-metrics-reservoir entries
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When both Netty HTTP metrics and App Insights are enabled, add the AzureMonitorMeterRegistry to Metrics.globalRegistry so Reactor Netty connection pool gauges flow to Application Insights in addition to the CSV reporter. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Instead of creating a separate SimpleMeterRegistry for Netty pool gauges, add the compositeRegistry directly to Metrics.globalRegistry. This means Netty connection pool metrics automatically flow through to: - Dropwizard CSV/Console reporter (via the DropwizardMeterRegistry bridge) - Application Insights (if AzureMonitorMeterRegistry is configured) - NettyHttpMetricsReporter CSV (for dedicated netty-pool-metrics.csv) Removes the unnecessary SimpleMeterRegistry and its import. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Separate concerns: - DropwizardBridgeMeterRegistry: a MeterRegistry that bridges Micrometer meters to a Dropwizard MetricRegistry (added to composite) - BenchmarkMetricsReporter: a plain reporter that reads from the bridge's Dropwizard MetricRegistry and outputs via CsvReporter/ConsoleReporter Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
...s/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/BenchmarkOrchestrator.java
Outdated
Show resolved
Hide resolved
...s/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/BenchmarkOrchestrator.java
Outdated
Show resolved
Hide resolved
...s/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/BenchmarkOrchestrator.java
Outdated
Show resolved
Hide resolved
1. BenchmarkMetricsReporter: accept reportingDirectory string instead of pre-built Path; create metrics/ subdirectory internally 2. NettyHttpMetricsReporter: use dropwizardBridge instead of compositeRegistry to avoid tricky percentile behavior 3. CosmosTotalResultReporter: use dropwizardBridge; add comment clarifying intentional separation from BenchmarkMetricsReporter Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Use Dropwizard's native Meter.getOneMinuteRate() for success/failure rates and Timer.getSnapshot().getMedian()/get99thPercentile() for latency, instead of manual delta tracking and rolling sample arrays. Removed: - 5 List<Double> sample arrays and percentile75() helper - Manual counter delta computation and rate calculation - Micrometer Timer/Counter/HistogramSnapshot imports Now accepts DropwizardBridgeMeterRegistry directly and reads from its underlying Dropwizard MetricRegistry for native rate/percentile support. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…o-ready 1. Self-contained: Cosmos client creation and config validation moved inside the reporter. No-op if no upload endpoint is configured. Orchestrator just calls create() + start() + stop(). 2. Uploads ALL metrics (not just op.calls/latency): Timers, Meters, Gauges, Counters — including Netty pool metrics when enabled. 3. Kusto-friendly schema: one document per meter per interval with MetricName (preserving all tag dimensions in hierarchical name), MetricType, Count, rates, percentiles, plus run metadata (Operation, TestVariation, Branch, Commit, Concurrency, CPU). 4. Removed builder pattern (unnecessary for single factory method). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
New architecture: - Console summary (ConsoleSummaryReporter): always on, logs success/failure rates, p50/p99 latency, CPU every interval - reportingDestination enum: CONSOLE (default), CSV, COSMOSDB, APPLICATION_INSIGHTS (mutually exclusive) - CSV: per-metric CSV files via Dropwizard CsvReporter - COSMOSDB: uploads all metrics with full Micrometer tag dimensions (Container, Operation, OperationStatusCode, ClientCorrelationId, RegionName, PartitionKeyRangeId, etc.) to Cosmos DB - APPLICATION_INSIGHTS: AzureMonitorMeterRegistry sends all SDK meters natively to Log Analytics/Kusto Config changes: - New field: metrics.reportingDestination - New section: metrics.applicationInsights (connectionString, instrumentationKey, stepSeconds, testCategory) - Backward compatible: reportingDirectory without explicit destination defaults to CSV - App Insights config from JSON, with fallback to system properties/env vars Files: - New: ReportingDestination.java, ConsoleSummaryReporter.java - Simplified: BenchmarkMetricsReporter.java (CSV-only) - Redesigned: CosmosTotalResultReporter.java (reads Micrometer for explicit tag dims) - Updated: BenchmarkConfig.java, BenchmarkOrchestrator.java Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- ReportingDestination: removed CONSOLE value (console is always on, not a choice) - reportingDestination field is now nullable (null = console-only) - Removed backward compat: no system property/env var fallback for App Insights - Config is clean: reportingDestination is explicit in JSON or omitted Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR refactors the azure-cosmos-benchmark tool to remove app-level CodaHale/Dropwizard metric tracking from benchmark workloads and instead rely on Micrometer + Cosmos SDK-emitted meters as the primary metrics source, with a new destination-based reporting model and simplified concurrency control.
Changes:
- Replaced per-benchmark metric tracking and custom subscribers/semaphore concurrency with
Flux.flatMap(concurrency)patterns and scheduler injection. - Introduced destination-based metrics reporting (console/CSV via Dropwizard bridge, Cosmos DB uploader, Azure Monitor/App Insights registry).
- Updated benchmark orchestration/config parsing and removed legacy metrics/reporting infrastructure and dependencies.
Reviewed changes
Copilot reviewed 48 out of 48 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| sdk/cosmos/azure-cosmos-benchmark/workload-config-sample.json | Adds sample JSON illustrating new metrics.destination configuration. |
| sdk/cosmos/azure-cosmos-benchmark/src/test/java/com/azure/cosmos/benchmark/WorkflowTest.java | Updates tests to construct workloads with a Reactor Scheduler instead of MetricRegistry. |
| sdk/cosmos/azure-cosmos-benchmark/src/test/java/com/azure/cosmos/benchmark/ReadMyWritesConsistencyTest.java | Updates consistency test to use Scheduler injection instead of MetricRegistry. |
| sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/linkedin/impl/metrics/MetricsImpl.java | Deletes legacy LinkedIn CTL metrics implementation (CodaHale-based). |
| sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/linkedin/impl/metrics/MetricsFactory.java | Deletes legacy metrics factory used by LinkedIn CTL code. |
| sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/linkedin/impl/QueryExecutor.java | Removes per-operation metrics tracking hooks from query execution path. |
| sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/linkedin/impl/Metrics.java | Deletes legacy metrics interface. |
| sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/linkedin/impl/GetExecutor.java | Removes per-operation metrics tracking hooks from get execution path. |
| sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/linkedin/impl/CosmosDBDataAccessor.java | Updates constructors/signatures after removing MetricsFactory dependency. |
| sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/linkedin/TestRunner.java | Removes shared Dropwizard MetricRegistry dependency from CTL runners. |
| sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/linkedin/QueryTestRunner.java | Removes MetricRegistry plumbing after metrics removal. |
| sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/linkedin/LICtlWorkload.java | Removes MetricRegistry from workload construction and runner creation. |
| sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/linkedin/GetTestRunner.java | Removes MetricRegistry from runner construction. |
| sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/linkedin/CompositeReadTestRunner.java | Removes MetricRegistry from runner construction. |
| sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/encryption/AsyncEncryptionWriteBenchmark.java | Converts workload execution to return Mono and removes subscriber/semaphore timing logic. |
| sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/encryption/AsyncEncryptionReadBenchmark.java | Converts workload execution to return Mono and removes subscriber/semaphore timing logic. |
| sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/encryption/AsyncEncryptionQuerySinglePartitionMultiple.java | Converts workload execution to return Mono and removes subscriber/semaphore logic. |
| sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/encryption/AsyncEncryptionQueryBenchmark.java | Converts workload execution to return Mono and removes custom latency subscriber logic. |
| sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/encryption/AsyncEncryptionBenchmark.java | Reworks core async runner to use Flux.generate + flatMap(concurrency) with injected scheduler and removes CodaHale meters. |
| sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/ctl/AsyncCtlWorkload.java | Reworks CTL workload runner to return Mono operations and use flatMap(concurrency) with injected scheduler. |
| sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/ThreadPrefixGaugeSet.java | Replaces Dropwizard MetricSet with Micrometer MeterBinder for thread-prefix gauges. |
| sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/TenantWorkloadConfig.java | Removes legacy meter name constants tied to old Dropwizard metrics. |
| sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/SyncWriteBenchmark.java | Removes Dropwizard MetricRegistry dependency from sync benchmarks. |
| sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/SyncReadBenchmark.java | Removes Dropwizard MetricRegistry dependency from sync benchmarks. |
| sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/SyncBenchmark.java | Removes Dropwizard meters/timers from sync benchmark base. |
| sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/ScheduledReporterFactory.java | Deletes legacy factory that chose ConsoleReporter vs CsvReporter from a MetricRegistry. |
| sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/ReportingDestination.java | Adds enum for metrics destination selection. |
| sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/ReadMyWriteWorkflow.java | Converts workflow execution to return Mono and removes semaphore/subscriber-based dispatch. |
| sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/NettyHttpMetricsReporter.java | Deletes custom CSV netty pool metrics sampler reporter. |
| sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/DropwizardBridgeMeterRegistry.java | Adds Micrometer->Dropwizard bridge registry for Dropwizard reporters. |
| sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/CsvReporterConfig.java | Adds typed config for CSV destination. |
| sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/CsvMetricsReporter.java | Adds Dropwizard Console/CSV reporter wrapper reading from the bridge registry. |
| sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/CosmosTotalResultReporter.java | Deletes legacy Dropwizard ScheduledReporter that uploaded aggregate results to Cosmos. |
| sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/CosmosReporterConfig.java | Adds typed config for Cosmos DB destination. |
| sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/CosmosMetricsReporter.java | Adds Micrometer meter uploader to Cosmos DB preserving tag dimensions. |
| sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/BenchmarkRequestSubscriber.java | Deletes legacy subscriber used for semaphore-based concurrency + latency timers. |
| sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/BenchmarkOrchestrator.java | Reworks orchestrator to build composite registries, handle destinations, inject scheduler, and manage lifecycle cleanup. |
| sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/BenchmarkConfig.java | Reworks config parsing to support destination-specific reporter configs under metrics.destination. |
| sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/AsyncWriteBenchmark.java | Converts async write workload to return Mono and removes custom subscriber-based timing logic. |
| sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/AsyncReadManyBenchmark.java | Converts readMany workload to return Mono and removes subscriber-based timing logic. |
| sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/AsyncReadBenchmark.java | Converts async read workload to return Mono and removes subscriber-based timing logic. |
| sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/AsyncQuerySinglePartitionMultiple.java | Converts query workload to return Mono and removes subscriber-based timing logic. |
| sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/AsyncQueryBenchmark.java | Converts async query workload to return Mono and removes custom latency subscriber logic. |
| sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/AsyncMixedBenchmark.java | Converts mixed workload to return Mono and removes subscriber/semaphore dispatch. |
| sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/AsyncBenchmark.java | Reworks core async runner to Flux.generate + flatMap(concurrency) and injected scheduler; removes Dropwizard tracking. |
| sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/AppInsightsReporterConfig.java | Adds typed config for App Insights/Azure Monitor destination. |
| sdk/cosmos/azure-cosmos-benchmark/pom.xml | Removes legacy Dropwizard metrics modules/reservoir dependency; keeps metrics-core and Micrometer registries. |
| eng/versioning/external_dependencies.txt | Removes dropped third-party dependencies from tracked list. |
You can also share your feedback on Copilot code review. Take the survey.
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/BenchmarkConfig.java
Show resolved
Hide resolved
...smos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/CsvMetricsReporter.java
Outdated
Show resolved
Hide resolved
...s/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/BenchmarkOrchestrator.java
Outdated
Show resolved
Hide resolved
...os/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/ThreadPrefixGaugeSet.java
Show resolved
Hide resolved
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/BenchmarkConfig.java
Show resolved
Hide resolved
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Use Schedulers.newBoundedElastic matching CosmosSchedulers pattern. BoundedElastic has 60s TTL so idle threads are reclaimed automatically, unlike newParallel which keeps threads alive indefinitely. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
newBoundedElastic has overhead from thread TTL eviction and queuing that is unnecessary for non-blocking benchmark workload dispatch. newParallel is a fixed-size, zero-overhead scheduler. Thread cleanup handled by dispose() which is already called in the finally block. daemon=true ensures threads don't prevent JVM exit. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Use ScheduledThreadPoolExecutor with allowCoreThreadTimeOut(true) and 60s keepAliveTime. This gives parallel-like performance (fixed pool, no queuing overhead) while idle threads are reclaimed after 60s. Wrapped with Schedulers.fromExecutorService() for Reactor integration. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…vior Use Reactor's built-in global parallel scheduler. Same scheduler the old benchmark code used via subscribeOn(Schedulers.parallel()). No custom scheduler creation or disposal needed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… bridge hop Console (default): now uses Micrometer's native LoggingMeterRegistry which logs all meters directly via SLF4J. No Dropwizard bridge needed. Eliminates one hop (Micrometer -> Dropwizard -> ConsoleReporter). CSV: still uses DropwizardBridgeMeterRegistry + CsvReporter (Dropwizard's CsvReporter has no native Micrometer equivalent for per-metric files). Bridge is only created when CSV destination is selected. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1. Fix 409 conflict: tag 'id' was overwriting document 'id' (UUID). Guard against tag keys that collide with document metadata. 2. Only upload cosmos.client.* metrics — skip jvm.*, system.*, netty.* which are diagnostic noise for the result database. 3. Skip zero-value gauges (e.g., jvm.buffer.memory.used = 0). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
CompositeMeterRegistry with no children silently drops all meters. COSMOSDB case needs a backing registry so SDK-emitted meters are stored and queryable when CosmosMetricsReporter iterates them. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The reporter was iterating compositeRegistry.getMeters() which only returns meters registered directly on that instance. The SDK's ClientTelemetryMetrics registers meters on its own internal composite and propagates data down into child registries, so the reporter must read from the SimpleMeterRegistry where meter values actually land. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
CsvMetricsReporter now reads directly from Micrometer SimpleMeterRegistry instead of bridging through DropwizardBridgeMeterRegistry. This eliminates per-record() overhead on the hot path (ExponentiallyDecayingReservoir ops) that caused ~30% perf degradation in benchmarks. - Rewrite CsvMetricsReporter to iterate MeterRegistry.getMeters() directly - Use SimpleMeterRegistry in BenchmarkOrchestrator CSV case - Delete DropwizardBridgeMeterRegistry (no longer needed) - Remove io.dropwizard.metrics:metrics-core dependency Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
fd77f66 to
14f037e
Compare
The warmup mechanism (skipWarmUpOperations) was tightly coupled to the Dropwizard MetricRegistry — it reset and re-registered Dropwizard meters after N warmup operations. With the switch to Micrometer (where meters are managed by the SDK's ClientTelemetryMetrics), there is no equivalent remove-and-re-register mechanism, so the warmup logic no longer functions. Remove the remaining dead config field, getter, parser case, and sparsity skip logic from TenantWorkloadConfig, AsyncBenchmark, and AsyncEncryptionBenchmark. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Allow JVM, Netty, and all other metrics to be uploaded to Cosmos DB, not just cosmos.client.* metrics. This makes JVM stats and Netty HTTP pool metrics visible when using the Cosmos DB reporting destination. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
xinlian12
left a comment
There was a problem hiding this comment.
PR Deep Review — Remove CodaHale Metrics + Modernize Benchmark Architecture
Overall: Excellent modernization — net -1015 lines, dramatically simplified benchmark subclasses, and idiomatic Reactor concurrency control. The direction is right. Below are 13 findings (2 blocking, 6 recommendations, 4 suggestions, 1 observation) focused on correctness gaps, resource leaks, and cross-SDK insights from the .NET benchmark tool.
Cross-SDK: The .NET Cosmos benchmark (Azure/azure-cosmos-dotnet-v3) retains warmup support, supports coexistent reporting destinations (console + CSV/Cosmos simultaneously), and maintains app-level metrics alongside SDK metrics — all areas where this PR diverges.
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/AsyncBenchmark.java
Outdated
Show resolved
Hide resolved
...s/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/BenchmarkOrchestrator.java
Show resolved
Hide resolved
...s/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/CosmosMetricsReporter.java
Outdated
Show resolved
Hide resolved
.../azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/linkedin/LICtlWorkload.java
Show resolved
Hide resolved
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/BenchmarkConfig.java
Show resolved
Hide resolved
...s/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/BenchmarkOrchestrator.java
Outdated
Show resolved
Hide resolved
...os/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/ReportingDestination.java
Outdated
Show resolved
Hide resolved
...smos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/CsvMetricsReporter.java
Outdated
Show resolved
Hide resolved
...os/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/ThreadPrefixGaugeSet.java
Outdated
Show resolved
Hide resolved
- Fix misleading scheduler comment (not dedicated, uses Schedulers.parallel()) - Close appInsightsRegistry in finally block (resource leak) - Fix DistributionSummary field names: remove incorrect 'Ms' suffix - Inject shared MeterRegistry into LICtlWorkload via AsyncClientFactory - Warn when multiple reporting destinations are configured - Add null validation for CsvReporterConfig and CosmosReporterConfig - Restore skipWarmUpOperations with new approach: run warmup ops before measured run via Flux.take().blockLast() (no Dropwizard dependency) - Buffer CSV file writes: collect all rows per report() cycle, flush once per file instead of opening/closing per row - Fix ReportingDestination CSV javadoc (was referencing Dropwizard) - Remove unused AtomicReference import from ThreadPrefixGaugeSet Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Tests for CsvReporterConfig and CosmosReporterConfig null/empty field validation, verifying IllegalArgumentException is thrown with clear messages for missing required fields and optional fields default correctly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
LoggingMeterRegistry is now always added to the CompositeMeterRegistry, providing real-time console visibility regardless of whether CSV, Cosmos DB, or Application Insights destination is configured. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Warmup was tightly coupled to Dropwizard MetricRegistry and the new Micrometer-based approach cannot cleanly isolate warmup metrics from measurement data. Removed all warmup config, logic, and Benchmark.warmup() method. Also always enables LoggingMeterRegistry alongside any reporting destination for real-time console visibility. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
FabianMeiswinkel
left a comment
There was a problem hiding this comment.
Love it! LGTM - Thx!
Remove CodaHale Metrics + Modernize Benchmark Architecture
1. SDK Metrics as Single Source of Truth
Removed all app-level metric tracking (
MetricRegistry,Meter,Timer,HdrHistogram) from every benchmark class. Benchmark classes are now pure workload drivers. The SDK emitscosmos.client.op.calls,cosmos.client.op.latency,cosmos.client.op.RUswith full tag dimensions (Container,Operation,OperationStatusCode,ClientCorrelationId,RegionName,PartitionKeyRangeId, etc.).2. Reactor Flux.flatMap Concurrency Control
Replaced hand-rolled
Semaphore+BaseSubscriber+AtomicLong count+synchronized wait/notifywithFlux.flatMap(concurrency). Subclasses returnMono<T>fromperformWorkload(long i). UsesSchedulers.parallel()(global shared singleton) — must NOT be disposed by the benchmark.3. Reporter Architecture — Destination-Based
Reporting destinations configured via
metrics.destinationin workload JSON. Console logging (LoggingMeterRegistry) is always active alongside any destination for real-time visibility.All reporters read from Micrometer
SimpleMeterRegistrydirectly — no Dropwizard bridge. This eliminates the per-record()overhead ofDropwizardBridgeMeterRegistry(which maintained a parallel set of Dropwizard meters backed byExponentiallyDecayingReservoir, causing ~30% perf degradation on the hot path).If multiple destinations are configured, a warning is logged and only the first match is used.
LoggingMeterRegistry(logs all meters via SLF4J)destination.csvCsvMetricsReporter— iteratesSimpleMeterRegistry.getMeters()directly, writes per-metric CSV files (buffered per reporting cycle)destination.cosmosCosmosMetricsReporter— reads all Micrometer meters (including JVM/Netty) with explicit tag dimensionsdestination.applicationInsightsAzureMonitorMeterRegistry— SDK sends all meters+tags to Log Analytics/Kusto4. Removed
skipWarmUpOperationsSupportThe warmup mechanism (
skipWarmUpOperationsconfig) has been completely removed. It was tightly coupled to the DropwizardMetricRegistry— it worked by resetting and re-registering Dropwizard meters after N initial operations. With the switch to Micrometer (where meters are managed by the SDK'sClientTelemetryMetrics, not the benchmark app), warmup metrics cannot be cleanly isolated from measurement data since counters are cumulative and the registry is shared.5. Workload JSON Config
Console (default — no destination needed):
{ "metrics": { "printingInterval": 10, "enableJvmStats": true }, "tenantDefaults": { "serviceEndpoint": "https://...", "masterKey": "..." }, "tenants": [{ "operation": "ReadLatency", "numberOfOperations": 100000 }] }CSV destination:
{ "metrics": { "printingInterval": 10, "destination": { "csv": { "reportingDirectory": "/tmp/benchmark-output" } } }, "tenantDefaults": { "serviceEndpoint": "https://...", "masterKey": "..." }, "tenants": [{ "operation": "ReadLatency", "numberOfOperations": 100000 }] }Cosmos DB destination:
{ "metrics": { "printingInterval": 10, "destination": { "cosmos": { "serviceEndpoint": "https://<results-account>.documents.azure.com:443/", "masterKey": "<results-account-key>", "database": "benchmarkResults", "container": "metrics", "testVariationName": "baseline-read-1k", "branchName": "main", "commitId": "abc1234" } } }, "tenantDefaults": { "serviceEndpoint": "https://...", "masterKey": "..." }, "tenants": [{ "operation": "ReadLatency", "numberOfOperations": 100000 }] }Application Insights destination:
{ "metrics": { "printingInterval": 10, "destination": { "applicationInsights": { "connectionString": "InstrumentationKey=...;IngestionEndpoint=...", "stepSeconds": 10, "testCategory": "nightly-perf" } } }, "tenantDefaults": { "serviceEndpoint": "https://...", "masterKey": "..." }, "tenants": [{ "operation": "ReadLatency", "numberOfOperations": 100000 }] }Typed config classes with null validation:
CsvReporterConfig,CosmosReporterConfig,AppInsightsReporterConfig.6. Resource Lifecycle
Schedulers.parallel()(global shared, not disposed)compositeRegistry.close(),appInsightsRegistry.close()in finallyloggingRegistry.close()in finallyJvmGcMetricsetc. stored and.close()d in finallyclose()method shuts down internal executorcompositeRegistryadded toMetrics.globalRegistrywhen enabled, removed in finally7. LinkedIn CTL Workload Metrics
Injected shared
MeterRegistryintoAsyncClientFactory.buildAsyncClient()so LICtlWorkload's Cosmos client participates in SDK telemetry reporting.8. Files
New:
ReportingDestination.javaCsvMetricsReporter.java(reads MicrometerSimpleMeterRegistrydirectly, writes CSV)CosmosMetricsReporter.java(Cosmos DB upload with full Micrometer tag dimensions)CsvReporterConfig.java,CosmosReporterConfig.java,AppInsightsReporterConfig.javaReporterConfigValidationTest.java(9 unit tests for config validation)workload-config-sample.jsonDeleted:
DropwizardBridgeMeterRegistry.java(no longer needed)ScheduledReporterFactory.java,BenchmarkRequestSubscriber.java,NettyHttpMetricsReporter.javalinkedin/impl/metrics/MetricsFactory.java,MetricsImpl.java,Metrics.javaDependencies:
metrics-core,metrics-jvm,metrics-graphite,hdrhistogram-metrics-reservoirmicrometer-core,micrometer-registry-azure-monitor,micrometer-registry-graphite9. Benchmark Results — PR vs Main (Multi-Destination)
VM: Standard_D16s_v3 (16 vCPU, 64GB) | Workload: ReadThroughput, 30 tenants, GATEWAY mode, 10-min duration | Commit: PR
00fe591vs Main00a35f4Console Destination (PR vs Main)
CSV Destination (PR vs Main)
PR uses Micrometer
CsvMetricsReporter→ per-metric CSV files. Main uses CodaHaleCsvReporter.Cosmos DB Destination (PR only)
Main branch has a known NPE bug in
CosmosTotalResultReporterthat prevents uploads.Application Insights Destination (PR vs Main)
Cross-Destination Comparison (PR branch, same commit)
Conclusion: ✅ No regression detected. All fixed-size thread pools, netty connections, latency, and throughput are equivalent between PR and main across all 4 destinations.