Fix deprecate MutableArrayData::extend and MutableArrayData::extend_nulls in favour of fallible try_extend / try_extend_nulls#9710
Conversation
|
Have you run any benchmarks against this PR? Also, are there any public API changes? |
|
run benchmark arrow |
|
Hi @HawaiianSpork, thanks for the request (#9710 (comment)). Only whitelisted users can trigger benchmarks. Allowed users: Dandandan, Fokko, Jefffrey, Omega359, adriangb, alamb, asubiotto, brunal, buraksenn, cetra3, codephage2020, comphead, erenavsarogullari, etseidl, friendlymatthew, gabotechs, geoffreyclaude, grtlr, haohuaijin, jonathanc-n, kevinjqliu, klion26, kosiew, kumarUjjawal, kunalsinghdadhwal, liamzwbao, mbutrovich, mkleen, mzabaluev, neilconway, rluvaton, sdf-jkl, timsaucer, xudong963, zhuqi-lucas. File an issue against this benchmark runner |
|
@alamb yes, this change adds the public method I ran the benchmarks and they show improvement, but I don't know if I believe them. Maybe getting rid of the panic improves the code but I think it is more likely just discrepancies running from my development machine. Verification cargo bench -p arrow --bench mutable_array --features test_utils -- --noplot
cargo bench -p arrow --bench concatenate_kernel --features test_utils -- --noplot
cargo bench -p arrow --bench filter_kernels --features test_utils -- --noplot
cargo bench -p arrow --bench interleave_kernels --features test_utils -- --noplotBenchmark ResultsMeasured against a clean Mutable Array
Concatenate Kernel (selected highlights)
All 25 concatenate cases improved 5–19%. Only the largest string concat case is neutral. Filter Kernels (selected highlights)
Nearly all 45+ filter cases improved 10–29%. The sole regression is Interleave Kernels (selected highlights)
All interleave cases improved 7–35% with no regressions. The Overall Summary
|
Which issue does this PR close?
MutableArrayData::extendandMutableArrayData::extend_nullsin favour of fallibletry_extend/try_extend_nulls#9709Rationale for this change
MutableArrayData::extendandextend_nullspanic at runtime when offset arithmetic overflows the underlying integer type (e.g. accumulating more than 2 GiB of data in aStringArray) or when the run-end counter overflows in aRunEndEncodedarray. Because these methods return()there is no way for callers to recover from or even detect the failure. This makes it impossible to build robust, panic-free pipelines on top ofMutableArrayData.What changes are included in this PR?
Adds two new methods to
MutableArrayDatainarrow-data:try_extend(index, start, end) -> Result<(), ArrowError>— returns an error instead of panicking on offset overflow. On error, internal buffers are rolled back to their pre-call lengths so the builder is left in a consistent, valid state.try_extend_nulls(len) -> Result<(), ArrowError>— returns an error instead of panicking when the run-end counter overflows in aRunEndEncodedarray.The original
extendandextend_nullsmethods are deprecated with a note pointing to the new alternatives. Their implementations delegate to the new methods andexpecton the result, preserving existing behaviour for code that has not yet migrated.All deprecated call sites within the workspace are updated:
arrow-cast/src/cast/list.rsextend→try_extend,extend_nulls→try_extend_nulls; errors mapped toArrowError::CastErrorarrow-json/src/reader/run_end_array.rsextend→try_extend; errors mapped toArrowError::JsonErrorarrow-select/src/concat.rsextend→try_extendarrow-select/src/filter.rsextend→try_extend;for_eachclosures converted toforloops to allow?arrow-select/src/interleave.rsextend→try_extendarrow-select/src/merge.rsextend→try_extend,extend_nulls→try_extend_nulls;for_eachclosure converted toforlooparrow-select/src/zip.rsextend→try_extend;for_eachclosure converted toforloopparquet/src/arrow/array_reader/fixed_size_list_array.rsextend→try_extend,extend_nulls→try_extend_nulls; errors mapped viageneral_err!parquet/src/arrow/array_reader/list_array.rsextend→try_extend; errors mapped viageneral_err!At each call site errors are surfaced in the type idiomatic for that crate rather than leaking an
InvalidArgumentErrorfrom the transform layer — cast functions returnCastError, JSON reader functions returnJsonError, and parquet reader functions convert toParquetError.Are these changes tested?
The changes are covered by the existing test suites for each affected crate. The new
try_extendandtry_extend_nullsmethods are exercised indirectly through all existing tests that exerciseextendandextend_nulls(since the deprecated methods now delegate to them), as well as through all the call sites updated in this PR.Are there any user-facing changes?
MutableArrayData::extendandMutableArrayData::extend_nullsare deprecated. Callers should migrate totry_extendandtry_extend_nullsrespectively and handle the returnedResult. The deprecated methods continue to compile and behave identically to before (panicking on overflow), so there are no breaking changes.Notes
An LLM was used to make some of the code modifications. All code has been reviewed by a human.