Fix deprecate `MutableArrayData::extend` and `MutableArrayData::extend_nulls` in favour of fallible `try_extend` / `try_extend_nulls` by HawaiianSpork · Pull Request #9710 · apache/arrow-rs

HawaiianSpork · 2026-04-14T00:21:26Z

Which issue does this PR close?

Closes Deprecate MutableArrayData::extend and MutableArrayData::extend_nulls in favour of fallible try_extend / try_extend_nulls #9709

Rationale for this change

MutableArrayData::extend and extend_nulls panic at runtime when offset arithmetic overflows the underlying integer type (e.g. accumulating more than 2 GiB of data in a StringArray) or when the run-end counter overflows in a RunEndEncoded array. Because these methods return () there is no way for callers to recover from or even detect the failure. This makes it impossible to build robust, panic-free pipelines on top of MutableArrayData.

What changes are included in this PR?

Adds two new methods to MutableArrayData in arrow-data:

try_extend(index, start, end) -> Result<(), ArrowError> — returns an error instead of panicking on offset overflow. On error, internal buffers are rolled back to their pre-call lengths so the builder is left in a consistent, valid state.
try_extend_nulls(len) -> Result<(), ArrowError> — returns an error instead of panicking when the run-end counter overflows in a RunEndEncoded array.

The original extend and extend_nulls methods are deprecated with a note pointing to the new alternatives. Their implementations delegate to the new methods and expect on the result, preserving existing behaviour for code that has not yet migrated.

All deprecated call sites within the workspace are updated:

File	Changes
`arrow-cast/src/cast/list.rs`	`extend` → `try_extend`, `extend_nulls` → `try_extend_nulls`; errors mapped to `ArrowError::CastError`
`arrow-json/src/reader/run_end_array.rs`	`extend` → `try_extend`; errors mapped to `ArrowError::JsonError`
`arrow-select/src/concat.rs`	`extend` → `try_extend`
`arrow-select/src/filter.rs`	`extend` → `try_extend`; `for_each` closures converted to `for` loops to allow `?`
`arrow-select/src/interleave.rs`	`extend` → `try_extend`
`arrow-select/src/merge.rs`	`extend` → `try_extend`, `extend_nulls` → `try_extend_nulls`; `for_each` closure converted to `for` loop
`arrow-select/src/zip.rs`	`extend` → `try_extend`; `for_each` closure converted to `for` loop
`parquet/src/arrow/array_reader/fixed_size_list_array.rs`	`extend` → `try_extend`, `extend_nulls` → `try_extend_nulls`; errors mapped via `general_err!`
`parquet/src/arrow/array_reader/list_array.rs`	`extend` → `try_extend`; errors mapped via `general_err!`

At each call site errors are surfaced in the type idiomatic for that crate rather than leaking an InvalidArgumentError from the transform layer — cast functions return CastError, JSON reader functions return JsonError, and parquet reader functions convert to ParquetError.

Are these changes tested?

The changes are covered by the existing test suites for each affected crate. The new try_extend and try_extend_nulls methods are exercised indirectly through all existing tests that exercise extend and extend_nulls (since the deprecated methods now delegate to them), as well as through all the call sites updated in this PR.

Are there any user-facing changes?

MutableArrayData::extend and MutableArrayData::extend_nulls are deprecated. Callers should migrate to try_extend and try_extend_nulls respectively and handle the returned Result. The deprecated methods continue to compile and behave identically to before (panicking on overflow), so there are no breaking changes.

Notes

An LLM was used to make some of the code modifications. All code has been reviewed by a human.

alamb · 2026-04-14T19:45:47Z

Have you run any benchmarks against this PR?

Also, are there any public API changes?

HawaiianSpork · 2026-04-18T00:46:14Z

run benchmark arrow

adriangbot · 2026-04-18T00:46:16Z

Hi @HawaiianSpork, thanks for the request (#9710 (comment)). Only whitelisted users can trigger benchmarks. Allowed users: Dandandan, Fokko, Jefffrey, Omega359, adriangb, alamb, asubiotto, brunal, buraksenn, cetra3, codephage2020, comphead, erenavsarogullari, etseidl, friendlymatthew, gabotechs, geoffreyclaude, grtlr, haohuaijin, jonathanc-n, kevinjqliu, klion26, kosiew, kumarUjjawal, kunalsinghdadhwal, liamzwbao, mbutrovich, mkleen, mzabaluev, neilconway, rluvaton, sdf-jkl, timsaucer, xudong963, zhuqi-lucas.

File an issue against this benchmark runner

HawaiianSpork · 2026-04-18T01:58:18Z

@alamb yes, this change adds the public method try_extend and try_extend_nulls while deprecating the public methods extend and extend_nulls

I ran the benchmarks and they show improvement, but I don't know if I believe them. Maybe getting rid of the panic improves the code but I think it is more likely just discrepancies running from my development machine.

Verification

cargo bench -p arrow --bench mutable_array --features test_utils -- --noplot
cargo bench -p arrow --bench concatenate_kernel --features test_utils -- --noplot
cargo bench -p arrow --bench filter_kernels --features test_utils -- --noplot
cargo bench -p arrow --bench interleave_kernels --features test_utils -- --noplot

Benchmark Results

Measured against a clean origin/main worktree (adf930815). The figures below compare representative median times from the baseline and this branch.

Mutable Array

Benchmark	`main`	Branch	Change
`mutable str 1024`	2.8719 ms	2.8386 ms	−1.2% (noise)
`mutable str nulls 1024`	1.7997 ms	1.4445 ms	−19.7% ✅

Concatenate Kernel (selected highlights)

Benchmark	`main`	Branch	Change
`concat i32 1024`	213.63 ns	211.08 ns	−8.7% ✅
`concat boolean 8192 over 100 arrays`	3.7913 µs	3.7798 µs	−16.1% ✅
`concat boolean nulls 8192 over 100 arrays`	8.3339 µs	8.2480 µs	−17.1% ✅
`concat str nulls 1024`	3.1109 µs	3.1091 µs	−19.0% ✅
`concat utf8_view all_inline len=12 null=0.2`	13.756 µs	13.273 µs	−12.7% ✅
`concat utf8_view len=128 null=0.2`	34.638 µs	32.861 µs	−13.2% ✅
`concat str 8192 over 100 arrays`	2.8813 ms	2.9033 ms	~0% (no change)

All 25 concatenate cases improved 5–19%. Only the largest string concat case is neutral.

Filter Kernels (selected highlights)

Benchmark	`main`	Branch	Change
`filter u8 high selectivity`	2.0530 µs	2.0463 µs	−29.1% ✅
`filter context u8 w NULLs high selectivity`	2.6031 µs	2.6806 µs	−27.0% ✅
`filter context i32 high selectivity`	5.4515 µs	5.6285 µs	−19.6% ✅
`filter context string dictionary high selectivity`	6.0332 µs	5.7601 µs	−18.7% ✅
`filter context i32 (kept 1/2)`	7.8298 µs	9.1505 µs	+3.2% ⚠️

Nearly all 45+ filter cases improved 10–29%. The sole regression is filter context i32 (kept 1/2) at +3.2% (~1.3 µs absolute), which is within normal measurement noise.

Interleave Kernels (selected highlights)

Benchmark	`main`	Branch	Change
`interleave dict_sparse(20,0.0) 100`	1.6187 µs	1.3896 µs	−34.8% ✅
`interleave dict_sparse(20,0.0) 400`	1.6071 µs	1.6247 µs	−33.1% ✅
`interleave dict_sparse(20,0.0) 1024`	2.1438 µs	2.1552 µs	−28.9% ✅
`interleave struct(i32,i32) 1024 [extended]`	1.3436 µs	1.2786 µs	−15.4% ✅
`interleave i32(0.5) 1024`	1.7791 µs	1.7334 µs	−14.5% ✅

All interleave cases improved 7–35% with no regressions. The dict_sparse cases benefit most directly from the MutableArrayData changes, showing 23–35% gains.

Overall Summary

Benchmark Suite	Avg Improvement	Notable
`mutable_array`	~10%	`str nulls` −20%
`concatenate_kernel`	~8–17%	Boolean & utf8_view see largest gains
`filter_kernels`	~10–29%	Nearly all cases improved; 1 minor noise regression (+3.2%)
`interleave_kernels`	~10–35%	`dict_sparse` sees 23–35% gains

fix: New try_extend method that doesn't panic

9bbc886

github-actions bot added parquet Changes to the parquet crate arrow Changes to the arrow crate labels Apr 14, 2026

HawaiianSpork added 2 commits April 13, 2026 19:30

format and fix clippy warnings

b561083

clean up docs a little

30aa976

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix deprecate `MutableArrayData::extend` and `MutableArrayData::extend_nulls` in favour of fallible `try_extend` / `try_extend_nulls`#9710

Fix deprecate `MutableArrayData::extend` and `MutableArrayData::extend_nulls` in favour of fallible `try_extend` / `try_extend_nulls`#9710
HawaiianSpork wants to merge 3 commits intoapache:mainfrom
relativityone:safe_array_extends

HawaiianSpork commented Apr 14, 2026

Uh oh!

alamb commented Apr 14, 2026

Uh oh!

HawaiianSpork commented Apr 18, 2026

Uh oh!

adriangbot commented Apr 18, 2026

Uh oh!

HawaiianSpork commented Apr 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

HawaiianSpork commented Apr 14, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Notes

Uh oh!

alamb commented Apr 14, 2026

Uh oh!

HawaiianSpork commented Apr 18, 2026

Uh oh!

adriangbot commented Apr 18, 2026

Uh oh!

HawaiianSpork commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark Results

Mutable Array

Concatenate Kernel (selected highlights)

Filter Kernels (selected highlights)

Interleave Kernels (selected highlights)

Overall Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

HawaiianSpork commented Apr 18, 2026 •

edited

Loading