Skip to content

[VL][Delta] Offload Delta OPTIMIZE compaction command transactions#12024

Open
malinjawi wants to merge 2 commits intoapache:mainfrom
malinjawi:vl-delta-optimize-compaction-offload
Open

[VL][Delta] Offload Delta OPTIMIZE compaction command transactions#12024
malinjawi wants to merge 2 commits intoapache:mainfrom
malinjawi:vl-delta-optimize-compaction-offload

Conversation

@malinjawi
Copy link
Copy Markdown
Contributor

@malinjawi malinjawi commented May 3, 2026

What changes are proposed in this pull request?

This PR adds standalone Delta OPTIMIZE compaction command offload for the Velox backend.

It lets Delta OPTIMIZE bin-pack compaction transactions run through GlutenOptimisticTransaction when native Delta write is enabled, so the compaction read/write command path can use Gluten's native Delta transaction handling.

Main changes:

  • add GlutenDeltaRunnableCommand, a wrapper for non-leaf Delta RunnableCommand implementations
  • wrap Delta OptimizeTableCommand in the Delta command offload rule for compaction-only OPTIMIZE
  • support both path-based and table-name OPTIMIZE compaction command forms
  • support partition-predicate compaction with OPTIMIZE ... WHERE
  • preserve fallback behavior when native Delta write is disabled
  • keep existing DELETE, UPDATE, save, CTAS, and RTAS command offload behavior unchanged
  • keep non-compaction OPTIMIZE forms on the existing Spark path for now:
    • OPTIMIZE ZORDER BY
    • liquid-clustering / clustered-table OPTIMIZE
    • REORG
    • FULL OPTIMIZE
  • add focused Spark 3.5 and Spark 4.0 coverage for:
    • OPTIMIZE delta.\path``
    • OPTIMIZE table_name
    • OPTIMIZE ... WHERE partition-predicate compaction
    • OptimizeMetrics add/remove-file accounting
    • Delta history operation metadata
    • native-write-disabled fallback

This PR is intentionally compaction-only:

  • no native ZORDER expression support yet
  • no InterleaveBits, HilbertLongIndex, or RangePartitionId support yet
  • no OPTIMIZE ZORDER sampling/shuffle planning changes yet
  • no liquid-clustering OPTIMIZE offload yet
  • no Optimized Write or auto-compaction changes yet

Those belong in follow-up PRs under the Delta optimization tracker.

Issue: #12025.

How was this patch tested?

Added and expanded Delta native write coverage in:

  • backends-velox/src-delta33/test/scala/org/apache/spark/sql/delta/DeltaNativeWriteSuite.scala
  • backends-velox/src-delta40/test/scala/org/apache/spark/sql/delta/DeltaNativeWriteSuite.scala

Validation run locally:

  • Spark 3.5 / Scala 2.12 clean compile/test-compile
  • Spark 3.5 / Scala 2.12 DeltaNativeWriteSuite + ClusteredTableClusteringSuite: 8 tests passed
  • Spark 3.5 / Scala 2.13 clean compile/test-compile
  • Spark 3.5 / Scala 2.13 DeltaNativeWriteSuite + ClusteredTableClusteringSuite: 8 tests passed
  • Spark 4.0 / Scala 2.13 clean compile/test-compile
  • Spark 4.0 / Scala 2.13 DeltaNativeWriteSuite + ClusteredTableClusteringSuite: 16 tests passed
  • Spark 3.5 Spotless check
  • Spark 4.0 Spotless check

Was this patch authored or co-authored using generative AI tooling?

Generated-by: OpenAI Codex

@github-actions github-actions Bot added the VELOX label May 3, 2026
@malinjawi malinjawi force-pushed the vl-delta-optimize-compaction-offload branch 2 times, most recently from 4d3a448 to 8da75af Compare May 3, 2026 12:18
@malinjawi malinjawi force-pushed the vl-delta-optimize-compaction-offload branch from 8da75af to e2d0a90 Compare May 3, 2026 13:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CORE works for Gluten Core VELOX

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant