Describe the bug
When AQE is enabled, two DynamicPartitionPruningSuite tests that rely on Comet DPP producing the same canonical exchanges/broadcasts on both sides of a self-join fail. These tests pass with AQE disabled and pass with AQE enabled when run against vanilla Spark.
Tests affected (Spark 4.0.1, COMET_PARQUET_SCAN_IMPL=auto):
DynamicPartitionPruningV1SuiteAEOn: SPARK-32509: Unused Dynamic Pruning filter shouldn't affect canonicalization and exchange reuse
- Expects 1
ReusedExchangeExec, finds 0.
- Plan shows one side as vanilla
FileScan + Exchange and the other side as CometNativeScan + CometExchange, so the two sides do not canonicalize to the same exchange and reuse never fires.
DynamicPartitionPruningV2SuiteAEOn and DynamicPartitionPruningV2FilterSuiteAEOn: SPARK-34637: DPP side broadcast query stage is created firstly
subqueryBroadcast.nonEmpty == true assertion fails; DPP is not being triggered at all for V2 data sources under AQE.
Steps to reproduce
- Apply
dev/diffs/4.0.1.diff to apache-spark v4.0.1.
- Remove the
IgnoreComet("TODO: https://github.com/apache/datafusion-comet/issues/1839") annotations on the 3 DPP tests in DynamicPartitionPruningSuite.scala.
- Run:
ENABLE_COMET=true ENABLE_COMET_ONHEAP=true COMET_PARQUET_SCAN_IMPL=auto build/sbt \
"sql/testOnly org.apache.spark.sql.DynamicPartitionPruningV1SuiteAEOn \
org.apache.spark.sql.DynamicPartitionPruningV2SuiteAEOn \
org.apache.spark.sql.DynamicPartitionPruningV2FilterSuiteAEOn \
-- -z \"SPARK-34637\" -z \"canonicalization and exchange reuse\""
Expected behavior
Both sides of a Comet self-join should canonicalize to the same exchange so that ReusedExchangeExec / reused broadcast exchange logic fires under AQE, matching vanilla Spark behavior.
Additional context
Discovered while addressing #1839 (the third test from that issue, avoid reordering broadcast join keys to match input hash partitioning, now passes everywhere and is being unignored). Possibly related to #4042 (native_datafusion: scalar subquery pushdown does not produce ReusedSubqueryExec).
Describe the bug
When AQE is enabled, two
DynamicPartitionPruningSuitetests that rely on Comet DPP producing the same canonical exchanges/broadcasts on both sides of a self-join fail. These tests pass with AQE disabled and pass with AQE enabled when run against vanilla Spark.Tests affected (Spark 4.0.1,
COMET_PARQUET_SCAN_IMPL=auto):DynamicPartitionPruningV1SuiteAEOn:SPARK-32509: Unused Dynamic Pruning filter shouldn't affect canonicalization and exchange reuseReusedExchangeExec, finds 0.FileScan+Exchangeand the other side asCometNativeScan+CometExchange, so the two sides do not canonicalize to the same exchange and reuse never fires.DynamicPartitionPruningV2SuiteAEOnandDynamicPartitionPruningV2FilterSuiteAEOn:SPARK-34637: DPP side broadcast query stage is created firstlysubqueryBroadcast.nonEmpty == trueassertion fails; DPP is not being triggered at all for V2 data sources under AQE.Steps to reproduce
dev/diffs/4.0.1.diffto apache-spark v4.0.1.IgnoreComet("TODO: https://github.com/apache/datafusion-comet/issues/1839")annotations on the 3 DPP tests inDynamicPartitionPruningSuite.scala.Expected behavior
Both sides of a Comet self-join should canonicalize to the same exchange so that
ReusedExchangeExec/ reused broadcast exchange logic fires under AQE, matching vanilla Spark behavior.Additional context
Discovered while addressing #1839 (the third test from that issue,
avoid reordering broadcast join keys to match input hash partitioning, now passes everywhere and is being unignored). Possibly related to #4042 (native_datafusion: scalar subquery pushdown does not produceReusedSubqueryExec).