feat(reader): Add read_with_metrics() for scan I/O metrics by mbutrovich · Pull Request #2349 · apache/iceberg-rust

mbutrovich · 2026-04-20T17:53:39Z

Which issue does this PR close?

Closes #.

What changes are included in this PR?

Add always-on per-scan I/O metrics to ArrowReader.

Motivation: Downstream engines need per-scan byte counts for their UIs. For example, DataFusion Comet uses this to populate bytes_scanned on its Iceberg scan operator, which flows through to Spark UI via TaskMetrics.inputMetrics.setBytesRead(). This must be per-scan, not global. Concurrent scans against the same FileIO need independent counters. The approach matches DataFusion's pattern of wrapping AsyncFileReader with a counting layer and is storage-backend agnostic.

ArrowReader::read() now returns ScanResult

ScanResult wraps the record batch stream and ScanMetrics. Accessors: stream(), metrics(), into_parts().
Metrics are always collected. One fetch_add(Relaxed) per I/O request, negligible overhead.
Counter is created fresh per read() call, so cloned readers get independent metrics.

New file: crates/iceberg/src/arrow/scan_metrics.rs

CountingFileRead<F: FileRead>: generic wrapper that increments a shared AtomicU64 on each read().
ScanMetrics: public handle exposing bytes_read().
ScanResult: public struct returned by ArrowReader::read().

FileRead blanket impl for Box<dyn FileRead>

Enables generic CountingFileRead<F> to wrap the boxed reader returned by FileIO::reader().

Single open_parquet_file with counting

All Parquet opens (data files and delete files) go through the same open_parquet_file wrapped with CountingFileRead, so bytes_read reflects total scan I/O.
build_parquet_reader(): shared internals for reader construction and metadata loading.

FileScanTaskReader struct (refactor)

Extracted process_file_scan_task's parameters into a Clone struct with a process(self, task) method, resolving a clippy::too_many_arguments violation. Struct and impl are co-located.

Re-exports

ScanMetrics and ScanResult re-exported from iceberg::arrow and iceberg::scan.

Are these changes tested?

test_scan_metrics_bytes_read in reader.rs: asserts bytes_read() == 0 before stream consumption (the stream is lazy) and bytes_read() > 0 after. test_scan_metrics_includes_delete_file_bytes: reads the same data file with and without a positional delete file and asserts bytes_read is strictly greater when deletes are present. All existing reader and scan tests pass (updated to use ScanResult::stream()).

mbutrovich · 2026-04-20T22:15:42Z

@blackmwk I'm trying really hard not to add more to reader.rs. Let me know if you have any suggestions.

CTTY

Just took a quick pass, love the direction!

blackmwk

Thanks @mbutrovich for this pr!

mbutrovich · 2026-04-21T20:45:17Z

Thanks for the feedback @CTTY and @blackmwk! I will address the comments tomorrow.

mbutrovich · 2026-04-22T17:20:13Z

I believe I addressed all of the feedback. Thanks for the first pass @CTTY @blackmwk, please take another look whenever you get a spare moment. Thank you!

mbutrovich added 5 commits April 20, 2026 13:20

Add metrics for bytes read, need to clean up clippy.

8921a95

Clean up.

a4457df

Make API opt-in.

f7a0507

Refactor open logic as well.

f707ba8

Put missing comment back.

ab4ebee

mbutrovich requested review from CTTY and blackmwk April 20, 2026 22:15

CTTY reviewed Apr 20, 2026

View reviewed changes

Comment thread crates/iceberg/src/arrow/reader.rs Outdated

Comment thread crates/iceberg/src/arrow/reader.rs

Comment thread crates/iceberg/src/arrow/reader.rs Outdated

blackmwk reviewed Apr 21, 2026

View reviewed changes

Comment thread crates/iceberg/src/arrow/scan_metrics.rs Outdated

Comment thread crates/iceberg/src/arrow/reader.rs Outdated

Comment thread crates/iceberg/src/arrow/reader.rs Outdated

mbutrovich mentioned this pull request Apr 22, 2026

Task metrics (bytes read) for CometIcebergNativeScan apache/datafusion-comet#4002

Open

mbutrovich and others added 3 commits April 22, 2026 08:52

Address PR feedback.

fa055fa

Handle delete files and add a test.

cf59635

Merge branch 'main' into bytes_read

9045930

mbutrovich requested review from CTTY and blackmwk April 22, 2026 17:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(reader): Add read_with_metrics() for scan I/O metrics#2349

feat(reader): Add read_with_metrics() for scan I/O metrics#2349
mbutrovich wants to merge 8 commits intoapache:mainfrom
mbutrovich:bytes_read

mbutrovich commented Apr 20, 2026 •

edited

Loading

Uh oh!

mbutrovich commented Apr 20, 2026

Uh oh!

CTTY left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

blackmwk left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mbutrovich commented Apr 21, 2026

Uh oh!

mbutrovich commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mbutrovich commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

What changes are included in this PR?

Are these changes tested?

Uh oh!

mbutrovich commented Apr 20, 2026

Uh oh!

CTTY left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

blackmwk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mbutrovich commented Apr 21, 2026

Uh oh!

mbutrovich commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mbutrovich commented Apr 20, 2026 •

edited

Loading