Tracking: remaining Spark 4.1 CI failures on #4093

Tracking issue for the four remaining clusters of test failures on Spark 4.1 (4.1.1) once the profile, shims, diff, and SQL-test workflow entry are in place. Context PRs: #4093 (Spark 4.1.1 enablement) and #4097 (spark-4.1 profile + shims prep, no tests).

## Status

- [ ] **OneRowRelationExec not transformed by Comet** (~30 sql-file expression tests)
- [ ] **Native parquet reader: user-defined struct schema mismatch** (2 tests, Linux + macOS)
- [ ] **Bloom filter result mismatch** (2 tests)
- [ ] **`bytesRead` task metric off by 6 to 14 times** (3 tests)

Two earlier clusters are already cleared on the branch (commit `5a60be22d`):

- `CometNativeWriteExec.newTaskTempFile` String overload became abstract-throwing in 4.1; switched to the `FileNameSpec` overload. Cleared 17 parquet-write failures.
- `remainder function` test expected `[DIVIDE_BY_ZERO]`; Spark 4.1 introduced `[REMAINDER_BY_ZERO]`. Branched the expected message on `isSpark41Plus`.

---

## 1. `OneRowRelationExec` not transformed by Comet

**Where:** ~30 failures in `Spark 4.1, JDK 17/auto [expressions]`, all `sql-file:` tests like `expressions/cast/cast.sql`, `expressions/datetime/*`, `expressions/struct/create_named_struct.sql`, etc.

**Symptom:**

```
Expected only Comet native operators, but found Project.
plan: Project
+-  Scan OneRowRelation [COMET: Scan OneRowRelation is not supported]
```

**Root cause:** Spark 4.1 added a new `OneRowRelationExec` physical leaf and stopped folding `SELECT cast(literal)` queries down to `LocalRelation` via `ConvertToLocalRelation`. In 4.0 those queries became `LocalTableScanExec`, which Comet has a wrapper for (`CometLocalTableScanExec`). In 4.1 they stay as `Project + OneRowRelationExec` and Comet's `CometExecRule` falls the whole subtree back to Spark.

**Fix options (decision needed):**
- (a) Add `CometOneRowRelationExec` analogous to `CometLocalTableScanExec`. Real fix, biggest scope, needs a Rust-side serde for an empty-row scan.
- (b) Pre-rewrite `Project + OneRowRelationExec` into `LocalTableScanExec` with a single empty row in a Comet planner rule.
- (c) Test-only allowlist (masks fallback, not recommended).

---

## 2. Native parquet reader: user-defined struct schema mismatch

**Where:** `native reader - select struct field with user defined schema - native_datafusion` and `- native_iceberg_compat` in both `Spark 4.1, JDK 17/auto [parquet]` and `macos-14/Spark 4.1, JDK 17, Scala 2.13 [parquet]`.

**Symptom:** `Results do not match for query`, schema is `c0: struct<y:int,x:string>` over a parquet relation. Comet's native reader returns different rows than Spark.

**Suspected root cause:** Spark 4.1 changed how user-supplied struct schemas are reconciled with on-disk Parquet field order, or field pruning behaves differently. Compare Spark 4.0 vs 4.1 planning output for this query and check whether user-schema field-name-vs-position behavior changed in `ParquetReadSupport` or `ParquetSchemaConverter`.

---

## 3. Bloom filter result mismatch

**Where:** `test BloomFilterMightContain from random input` and `bloom_filter_agg` in `Spark 4.1, JDK 17/auto [exec]`.

**Symptom:** Comet and Spark produce different `might_contain` results for the same input.

**Suspected root cause:** Spark 4.1 likely changed the bloom filter binary layout, hash seed, or default false-positive probability. Diff `BloomFilterImpl` / `BloomFilterAggregate` between 4.0 and 4.1, then mirror in Comet's bloom filter code in `native/spark-expr`.

---

## 4. `bytesRead` task metric off by 6 to 14 times

**Where:** `native_datafusion scan reports task-level input metrics matching Spark`, `input metrics aggregate across multiple native scans in a join`, `... in a union` in `Spark 4.1, JDK 17/auto [exec]` (`CometTaskMetricsSuite`).

**Symptom:**
```
9.6 was greater than or equal to 0.7, but 9.6 was not less than or equal to 1.3
bytesRead ratio out of range: comet=90498, spark=9427, ratio=9.6
```

Two more failures with similar 6.4 and 13.9 ratios.

**Suspected root cause:** Spark 4.1 changed what `inputMetrics.bytesRead` accounts for, most likely now reports a smaller subset (e.g. only bytes actually read into row buffers, versus full Parquet footer plus row group). Compare `ParquetFileReader` / `PartitionedFile` accounting between 4.0 and 4.1 and adjust Comet's metric source accordingly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tracking: remaining Spark 4.1 CI failures on #4093 #4098

Status

1. `OneRowRelationExec` not transformed by Comet

2. Native parquet reader: user-defined struct schema mismatch

3. Bloom filter result mismatch

4. `bytesRead` task metric off by 6 to 14 times

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Tracking: remaining Spark 4.1 CI failures on #4093 #4098

Description

Status

1. OneRowRelationExec not transformed by Comet

2. Native parquet reader: user-defined struct schema mismatch

3. Bloom filter result mismatch

4. bytesRead task metric off by 6 to 14 times

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

1. `OneRowRelationExec` not transformed by Comet

4. `bytesRead` task metric off by 6 to 14 times