Skip to content

fix: reject incompatible decimal precision/scale in native_datafusion scan#4090

Open
andygrove wants to merge 1 commit intoapache:mainfrom
andygrove:fix-issue-4089-decimal-precision
Open

fix: reject incompatible decimal precision/scale in native_datafusion scan#4090
andygrove wants to merge 1 commit intoapache:mainfrom
andygrove:fix-issue-4089-decimal-precision

Conversation

@andygrove
Copy link
Copy Markdown
Member

Which issue does this PR close?

Closes #4089.

Rationale for this change

When the native_datafusion scan reads a Parquet column with a higher-precision decimal physical type under a requested read schema with smaller precision or different scale, the existing schema adapter falls through to Spark's Cast expression. Cast happily rescales or truncates, producing wrong values silently. Spark's vectorized reader rejects this with SchemaColumnConvertNotSupportedException, and native_iceberg_compat already does the same via TypeUtil.checkParquetType. The native scan should match.

What changes are included in this PR?

native/core/src/parquet/schema_adapter.rs: in replace_with_spark_cast, add a guard before the existing branches that returns a DataFusionError::Plan when both physical_type and target_type are Decimal128 and either the target precision is smaller than the source precision or the scales differ. The rule mirrors Spark's TypeUtil.isDecimalTypeMatched (Spark 3.x path).

The check is intentionally narrow: it only fires for decimal-to-decimal mismatches, leaving every other type path unchanged. The closed prior attempt at broad schema validation in #3311 broke unrelated tests; this one does not.

Spark 4.0's decimal type widening (allowing strictly-larger target precision/scale per SPARK-37147) is not implemented here. For #4089's specific case (Decimal(10,2) read as Decimal(5,0)) Spark 4.0 also rejects, so behavior is correct on all supported versions. Implementing 4.0's wider widening is a follow-up.

How are these changes tested?

Added a focused test to ParquetReadSuite: native_datafusion rejects incompatible decimal precision/scale. It writes Decimal(10, 2) data, reads it under Decimal(5, 0), forces spark.comet.scan.impl=native_datafusion and spark.sql.sources.useV1SourceList=parquet, and asserts that collect() raises SparkException. Verified against ParquetReadV1Suite (44 tests, all pass; 1 pre-existing test ignored). The behavior is also covered by the per-impl matrix added in #4087, which will need its decimal(10,2) read as decimal(5,0): native_datafusion assertion flipped from "succeeds" to "throws" in a follow-up once that PR merges.

… scan

The native_datafusion Spark physical expression adapter previously fell
through to a Spark Cast for decimal-to-decimal type changes, which
silently rescales or truncates values that should have raised an error.
Mirror Spark's TypeUtil.isDecimalTypeMatched (Spark 3.x rule) by
rejecting reads where the target precision is smaller than the source
precision or the scales differ.

Closes apache#4089.
@andygrove andygrove added correctness bug Something isn't working labels Apr 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working correctness

Projects

None yet

Development

Successfully merging this pull request may close these issues.

native_datafusion: incompatible decimal precision/scale read silently succeeds without value validation

1 participant