Skip to content

[Feature] Add Native Scan Support for Apache Hudi Copy-On-Write Tables #2030

@slfan1989

Description

@slfan1989

Description

Add native scan support for Apache Hudi Copy-On-Write (COW) tables using Auron's vectorized execution engine. This enhancement enables Auron to accelerate Hudi table queries by converting FileSourceScanExec operations on Hudi tables to native Parquet/ORC scan implementations.

Scope

Supported Features

  • COW (Copy-On-Write) tables: Native scan for Parquet and ORC base files
  • Configuration switch: spark.auron.enable.hudi.scan (default: true)
  • Timestamp handling: Automatic fallback when native timestamp scanning is disabled

Limitations (Initial Version)

  • MOR (Merge-On-Read) tables: Not supported, automatically falls back to Spark
  • Time travel queries: Falls back to Spark to preserve metadata semantics
  • Spark version: Only Spark 3.0–3.5 (Spark 4.x not supported)
  • Hudi version: Only Hudi 0.15.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions