fix: reject incompatible decimal precision/scale in native_datafusion scan by andygrove · Pull Request #4090 · apache/datafusion-comet

andygrove · 2026-04-25T19:04:15Z

Which issue does this PR close?

Closes #4089.

Rationale for this change

When the native_datafusion scan reads a Parquet column with a higher-precision decimal physical type under a requested read schema with smaller precision or different scale, the existing schema adapter falls through to Spark's Cast expression. Cast happily rescales or truncates, producing wrong values silently. Spark's vectorized reader rejects this with SchemaColumnConvertNotSupportedException, and native_iceberg_compat already does the same via TypeUtil.checkParquetType. The native scan should match.

What changes are included in this PR?

native/core/src/parquet/schema_adapter.rs: in replace_with_spark_cast, add a guard before the existing branches that returns a DataFusionError::Plan when both physical_type and target_type are Decimal128 and either the target precision is smaller than the source precision or the scales differ. The rule mirrors Spark's TypeUtil.isDecimalTypeMatched (Spark 3.x path).

The check is intentionally narrow: it only fires for decimal-to-decimal mismatches, leaving every other type path unchanged. The closed prior attempt at broad schema validation in #3311 broke unrelated tests; this one does not.

Spark 4.0's decimal type widening (allowing strictly-larger target precision/scale per SPARK-37147) is not implemented here. For #4089's specific case (Decimal(10,2) read as Decimal(5,0)) Spark 4.0 also rejects, so behavior is correct on all supported versions. Implementing 4.0's wider widening is a follow-up.

How are these changes tested?

Added a focused test to ParquetReadSuite: native_datafusion rejects incompatible decimal precision/scale. It writes Decimal(10, 2) data, reads it under Decimal(5, 0), forces spark.comet.scan.impl=native_datafusion and spark.sql.sources.useV1SourceList=parquet, and asserts that collect() raises SparkException. Verified against ParquetReadV1Suite (44 tests, all pass; 1 pre-existing test ignored). The behavior is also covered by the per-impl matrix added in #4087, which will need its decimal(10,2) read as decimal(5,0): native_datafusion assertion flipped from "succeeds" to "throws" in a follow-up once that PR merges.

… scan The native_datafusion Spark physical expression adapter previously fell through to a Spark Cast for decimal-to-decimal type changes, which silently rescales or truncates values that should have raised an error. Mirror Spark's TypeUtil.isDecimalTypeMatched (Spark 3.x rule) by rejecting reads where the target precision is smaller than the source precision or the scales differ. Closes apache#4089.

andygrove added correctness bug Something isn't working labels Apr 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: reject incompatible decimal precision/scale in native_datafusion scan#4090

fix: reject incompatible decimal precision/scale in native_datafusion scan#4090
andygrove wants to merge 1 commit intoapache:mainfrom
andygrove:fix-issue-4089-decimal-precision

andygrove commented Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

andygrove commented Apr 25, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant