native_datafusion: STRING column read as INT silently returns garbage values

## Description

When the `native_datafusion` scan reads a Parquet column whose physical type is `BINARY` (STRING) under a requested read schema of `INT`, it silently reinterprets the BINARY bytes as raw INT32 bytes and returns garbage values. Spark's vectorized reader throws on this mismatch on all supported versions, so this is a correctness gap (returns wrong answers without an error) rather than a strict-mode parity gap.

## Reproduction

```scala
withSQLConf(
  CometConf.COMET_NATIVE_SCAN_IMPL.key -> CometConf.SCAN_NATIVE_DATAFUSION,
  SQLConf.USE_V1_SOURCE_LIST.key -> "parquet") {
  withTempPath { dir =>
    val path = dir.getCanonicalPath
    Seq("a", "b", "c").toDF("c").write.parquet(path)
    val df = spark.read.schema("c int").parquet(path)
    df.show() // returns 3 rows of meaningless integers; should throw
  }
}
```

`native_iceberg_compat` correctly throws `SparkException` for this case (matches Spark).

## Affected versions

All supported Spark profiles (3.4, 3.5, 4.0). Reproduced on Comet `main` while building #4087.

## Expected behavior

The native reader should detect that the requested type (`INT`) is not byte-compatible with the physical column type (`BINARY`/UTF8) and raise an exception, matching Spark's `SchemaColumnConvertNotSupportedException`.

## Test coverage

Documented in `ParquetSchemaMismatchSuite` (added in #4087) under the test name `string read as int: native_datafusion`. The test currently asserts the buggy behavior so future fixes will need to update the assertion (and the matrix in the file header) when this is resolved.

## Parent issue

Split from #3720.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

native_datafusion: STRING column read as INT silently returns garbage values #4088

Description

Reproduction

Affected versions

Expected behavior

Test coverage

Parent issue

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

native_datafusion: STRING column read as INT silently returns garbage values #4088

Description

Description

Reproduction

Affected versions

Expected behavior

Test coverage

Parent issue

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions