Skip to content

Tracking: remaining Spark 4.1 CI failures on #4093 #4098

@andygrove

Description

@andygrove

Tracking issue for the four remaining clusters of test failures on Spark 4.1 (4.1.1) once the profile, shims, diff, and SQL-test workflow entry are in place. Context PRs: #4093 (Spark 4.1.1 enablement) and #4097 (spark-4.1 profile + shims prep, no tests).

Status

  • OneRowRelationExec not transformed by Comet (~30 sql-file expression tests)
  • Native parquet reader: user-defined struct schema mismatch (2 tests, Linux + macOS)
  • Bloom filter result mismatch (2 tests)
  • bytesRead task metric off by 6 to 14 times (3 tests)

Two earlier clusters are already cleared on the branch (commit 5a60be22d):

  • CometNativeWriteExec.newTaskTempFile String overload became abstract-throwing in 4.1; switched to the FileNameSpec overload. Cleared 17 parquet-write failures.
  • remainder function test expected [DIVIDE_BY_ZERO]; Spark 4.1 introduced [REMAINDER_BY_ZERO]. Branched the expected message on isSpark41Plus.

1. OneRowRelationExec not transformed by Comet

Where: ~30 failures in Spark 4.1, JDK 17/auto [expressions], all sql-file: tests like expressions/cast/cast.sql, expressions/datetime/*, expressions/struct/create_named_struct.sql, etc.

Symptom:

Expected only Comet native operators, but found Project.
plan: Project
+-  Scan OneRowRelation [COMET: Scan OneRowRelation is not supported]

Root cause: Spark 4.1 added a new OneRowRelationExec physical leaf and stopped folding SELECT cast(literal) queries down to LocalRelation via ConvertToLocalRelation. In 4.0 those queries became LocalTableScanExec, which Comet has a wrapper for (CometLocalTableScanExec). In 4.1 they stay as Project + OneRowRelationExec and Comet's CometExecRule falls the whole subtree back to Spark.

Fix options (decision needed):

  • (a) Add CometOneRowRelationExec analogous to CometLocalTableScanExec. Real fix, biggest scope, needs a Rust-side serde for an empty-row scan.
  • (b) Pre-rewrite Project + OneRowRelationExec into LocalTableScanExec with a single empty row in a Comet planner rule.
  • (c) Test-only allowlist (masks fallback, not recommended).

2. Native parquet reader: user-defined struct schema mismatch

Where: native reader - select struct field with user defined schema - native_datafusion and - native_iceberg_compat in both Spark 4.1, JDK 17/auto [parquet] and macos-14/Spark 4.1, JDK 17, Scala 2.13 [parquet].

Symptom: Results do not match for query, schema is c0: struct<y:int,x:string> over a parquet relation. Comet's native reader returns different rows than Spark.

Suspected root cause: Spark 4.1 changed how user-supplied struct schemas are reconciled with on-disk Parquet field order, or field pruning behaves differently. Compare Spark 4.0 vs 4.1 planning output for this query and check whether user-schema field-name-vs-position behavior changed in ParquetReadSupport or ParquetSchemaConverter.


3. Bloom filter result mismatch

Where: test BloomFilterMightContain from random input and bloom_filter_agg in Spark 4.1, JDK 17/auto [exec].

Symptom: Comet and Spark produce different might_contain results for the same input.

Suspected root cause: Spark 4.1 likely changed the bloom filter binary layout, hash seed, or default false-positive probability. Diff BloomFilterImpl / BloomFilterAggregate between 4.0 and 4.1, then mirror in Comet's bloom filter code in native/spark-expr.


4. bytesRead task metric off by 6 to 14 times

Where: native_datafusion scan reports task-level input metrics matching Spark, input metrics aggregate across multiple native scans in a join, ... in a union in Spark 4.1, JDK 17/auto [exec] (CometTaskMetricsSuite).

Symptom:

9.6 was greater than or equal to 0.7, but 9.6 was not less than or equal to 1.3
bytesRead ratio out of range: comet=90498, spark=9427, ratio=9.6

Two more failures with similar 6.4 and 13.9 ratios.

Suspected root cause: Spark 4.1 changed what inputMetrics.bytesRead accounts for, most likely now reports a smaller subset (e.g. only bytes actually read into row buffers, versus full Parquet footer plus row group). Compare ParquetFileReader / PartitionedFile accounting between 4.0 and 4.1 and adjust Comet's metric source accordingly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    priority:highCrashes, panics, segfaults, major functional breakagespark 4

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions