Skip to content

feat: add encode time tracking for shuffle operations#4068

Open
0lai0 wants to merge 3 commits intoapache:mainfrom
0lai0:track_ColumnarShuffle
Open

feat: add encode time tracking for shuffle operations#4068
0lai0 wants to merge 3 commits intoapache:mainfrom
0lai0:track_ColumnarShuffle

Conversation

@0lai0
Copy link
Copy Markdown
Contributor

@0lai0 0lai0 commented Apr 24, 2026

Which issue does this PR close?

Closes #1212
Part of #3996

Rationale for this change

Comet already reports encoding/compression metrics for native shuffle, but JVM columnar shuffle (CometColumnarExchange) either showed 0 ms or lacked a useful task-level distribution in SQL UI. This made it difficult to compare shuffle behavior across spark.comet.shuffle.mode=native and jvm, and reduced observability for JVM shuffle performance tuning.

What changes are included in this PR?

  • Aligned the JVM/native shuffle spill contract so encode/compression timing is propagated end-to-end: native spill results are consumed as (written_bytes, checksum, encode_nanos), with corresponding JNI/Java updates.

  • Added shared AtomicLong encode-time accumulators in SpillWriter, CometShuffleExternalSorterSync/Async, and CometDiskBlockWriter so encode time is aggregated correctly across batches, spills, and concurrent sorter/writer instances.

  • Added getEncodeNanos() to CometShuffleExternalSorter and wired the accumulated value into the encode_time SQLMetric in both CometUnsafeShuffleWriter and CometBypassMergeSortShuffleWriter.

  • Ensured JVM columnar shuffle dependencies carry shuffleWriteMetrics, allowing CometShuffleManager to retrieve and pass encode_time into the shuffle writers.

  • Added a regression test to verify encode_time exists and is greater than zero for columnar shuffle workloads.

How are these changes tested?

Screenshot 2026-04-24 at 11 42 26 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add encoding + compression metrics to columnar shuffle

1 participant