Open
Conversation
1. PR-conditional emulator matrix (16 → 11 jobs): Drops redundant JDK variants for Spark/Kafka in PR builds. Full matrix on main merges. Dropped for PRs (5 jobs, ~5 agent hours saved): - Spark 3.3 Java 11 (keeping Java 8) - Spark 3.4 Java 8 (keeping Java 11) - Spark 3.5/Scala 2.12 Java 8 (keeping Java 17) - Spark 4.0/Scala 2.13 Java 17 (keeping Java 21) - Kafka Java 11 (keeping Java 17) 2. Increase BuildParallelization from 1 to 2 in all stages (Build, TestEmulator, TestVNextEmulator). 3. Skip maven-shade-plugin for non-Spark/non-Kafka emulator jobs: Core emulator, long emulator, and encryption jobs don't need Spark/Kafka uber JARs. Adding -Dshade.skip=true saves ~90s of shade plugin execution per Spark module × 5 modules = ~7-8 min per non-Spark job (5 jobs × 7 min = ~35 min agent time saved). 4. Remove outdated comment about emulator download time. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The antrun 03-repack phase expects shade output (native .jnilib/.so files in target/tmp/). When -Dshade.skip=true, the shade output doesn't exist and antrun fails with 'Could not find file'. Add -Dmaven.antrun.skip=true alongside -Dshade.skip=true. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The test step runs 'clean verify' which recompiles everything from scratch, including Spark shade. Our BuildOptions only affected the build step. Add -Dshade.skip=true -Dmaven.antrun.skip=true to AdditionalArgs for non-Spark jobs so it flows into TestOptions too. Keep BuildOptions for the build step as well (both steps need it). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add BuildOptions parameter through ci.yml → ci.tests.yml → build-and-test.yml pipeline chain. Defaults to empty string (no behavior change for other SDKs). Cosmos Build stage sets BuildOptions to '-Dshade.skip=true -Dmaven.antrun.skip=true' to skip Spark/Kafka uber JAR creation during unit test matrix jobs, saving ~14 min per job. The release artifact deploy step is unaffected. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Each Spark emulator job previously compiled ALL 14 modules including other Spark versions it doesn't test, wasting ~11 min per job on unnecessary shade+compile. Changes: - generate-project-list.ps1: Check for ProjectListOverride env var at the top. If set, use it directly and skip normal computation. Defaults to empty (no behavior change for other SDKs). - Emulator matrix JSONs: Add ProjectListOverride for each Spark and Kafka job with only the modules they need (core + their specific Spark/Kafka module). Example: Spark 3.5/2.13 job previously built 14 modules (41 min test step). Now builds only 6 modules, saving ~11 min per Spark job. Estimated savings: ~11 min × 9 Spark jobs + ~5 min × 2 Kafka jobs = ~109 min agent time per full CI run. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…mpilation Emulator, Long Emulator, and Encryption jobs were compiling all 14 cosmos modules including 7 Spark modules (Scala compilation ~10-16 min) despite only running core emulator tests. Add ProjectListOverride to limit these jobs to only the modules they actually test: - Emulator/Long Emulator: azure-cosmos, azure-cosmos-test, azure-cosmos-tests - Encryption: adds azure-cosmos-encryption Also reverts the no-op TestSuiteBase trigger commit. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
8b4024c to
ddec16d
Compare
benbp
reviewed
Mar 5, 2026
|
|
||
| # If ProjectListOverride is set (e.g., from matrix variables), use it directly | ||
| # to avoid building unnecessary modules in jobs that only test a subset. | ||
| if ($env:PROJECTLISTOVERRIDE -and $env:PROJECTLISTOVERRIDE -notlike '*ProjectListOverride*') { |
Member
There was a problem hiding this comment.
You can override the existing ArtifactsJson variable today, but it's messier:
azure-sdk-for-java/sdk/core/version-overrides-matrix.json
Lines 12 to 16 in 8399ef1
Given your scenario though, it's probably simpler to just allow this type of override. @alzimmermsft can you think of any gotchas here?
Member
There was a problem hiding this comment.
The short answer is yes, there could be issues caused by this if the manual project list override doesn't fully enclose the build space, but the bigger problem could be in From Source runs where it calculates the build space. But overall, I'm good with this as this should be used in very niche scenarios, but two thoughts:
- @kushagraThapar, mind removing this for one build run to see how much this affects CI time? Based on what I know about the emulator runs, I think removing Shade and Ant are the majority of the CI time improvement. If this doesn't affect build time much we may just want to remove it.
- If we can, guard this on From Source runs, if that is something we can check for. Should just be check on $env:TESTFROMSOURCE being false / missing.
benbp
approved these changes
Mar 5, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Cosmos CI Build Optimizations
Summary
Comprehensive CI pipeline infrastructure optimization for Cosmos DB emulator tests and Build stage unit tests. Targets redundant compilation, unnecessary uber JAR creation, and excessive job count.
Estimated savings:
Changes
1. PR-conditional emulator matrix (16 → 11 jobs)
Created
cosmos-emulator-matrix-pr.jsonwith reduced JDK variants for PR builds. Full matrix runs on main merges only.Dropped for PRs (5 jobs):
File:
eng/pipelines/templates/stages/cosmos-emulator-matrix-pr.json(new),cosmos-sdk-client.yml(conditional matrix selection)2. Skip maven-shade-plugin for non-Spark/non-Kafka jobs
Core emulator, long emulator, and encryption jobs don't need Spark/Kafka uber JARs. Added
-Dshade.skip=true -Dmaven.antrun.skip=trueviaAdditionalArgsto skip shade plugin in both build and test steps.Savings: ~14 min per non-Spark job — the build step previously spent 88% of its time (14 of 17 min) creating Spark uber JARs.
Files:
cosmos-emulator-matrix.json,cosmos-emulator-matrix-pr.json3. Per-job ProjectListOverride for Spark/Kafka jobs
Each Spark emulator job previously compiled ALL 14 modules including other Spark versions it doesn't test (~11 min wasted per job). Added
ProjectListOverridesupport togenerate-project-list.ps1— if set via matrix variable, the script uses it directly instead of computing from the full artifacts list.Each Spark job now only builds:
azure-cosmos+azure-cosmos-test+azure-cosmos-tests+ its specific Spark module.Savings: ~11 min × 9 Spark jobs = ~99 min agent time
Files:
eng/pipelines/scripts/generate-project-list.ps1,cosmos-emulator-matrix.json,cosmos-emulator-matrix-pr.json4. BuildOptions plumbing for Build stage unit tests
Added
BuildOptionsparameter throughci.yml→ci.tests.yml→build-and-test.ymlpipeline chain. Defaults to empty (no behavior change for other SDKs). Cosmos Build stage sets it to skip shade since unit tests don't need uber JARs.Savings: ~14 min per unit test job
Files:
eng/pipelines/templates/jobs/ci.yml,eng/pipelines/templates/jobs/ci.tests.yml,cosmos-sdk-client.yml5. Increase Maven build parallelization (1 → 2)
All stages (Build, TestEmulator, TestVNextEmulator) now use
BuildParallelization: 2.File:
cosmos-sdk-client.ymlTesting
Pipeline changes validated by CI itself. The
generate-project-list.ps1change is backward compatible —ProjectListOverridedefaults to empty (no-op for non-Cosmos pipelines).