Summary
min_max_scalar in datafusion/functions-aggregate-common/src/min_max.rs was extracted to make error-path control flow testable, but the core scalar dispatch is still encoded in the large min_max_scalar_impl! macro. This macro packs type matching, validation, recursion, and return-flow behavior into non-local expansion paths that are harder to step through, test in isolation, and evolve safely.
This issue proposes replacing macro-heavy scalar dispatch with a function-first implementation that preserves current semantics and error messages while improving testability and maintainability.
Current State
- Scalar dispatch is implemented in
min_max_scalar_impl! with many match arms.
- Dictionary recursion and dictionary-key-type invariants are embedded inside macro expansion.
- Decimal compatibility checks and some incompatible-type errors are spread across macro arms.
min_max_scalar currently wraps macro invocation and maps Ordering to min or max semantics.
Problem
- Macro expansion hides control flow and increases debugging friction.
- It is difficult to unit test small dispatch decisions without exercising large macro branches.
- Future behavior changes risk reintroducing non-local control-flow mistakes.
- Compiler errors and review diffs are harder to reason about in a large macro body.
Goals
- Move scalar dispatch to regular Rust functions with explicit control flow.
- Preserve all externally visible behavior:
- Same min/max results for all currently supported scalar variants.
- Same dictionary semantics and key-type mismatch invariants.
- Same decimal precision/scale checks.
- Same NaN behavior.
- Same error class and equivalent error text for incompatible inputs.
- Keep the change crate-scoped and reviewable.
Non-Goals
- No dictionary batch optimization work in this issue.
- No semantic changes to aggregate min/max behavior.
- No broad refactor of batch kernels outside scalar dispatch.
Proposed Approach
- Introduce function-based dispatch helpers:
fn min_max_scalar(lhs: &ScalarValue, rhs: &ScalarValue, ordering: Ordering) -> Result<ScalarValue> remains the entrypoint.
- Add internal helpers to isolate concerns, such as:
fn min_max_scalar_same_variant(...) -> Result<ScalarValue>
fn min_max_dictionary_scalar(...) -> Result<ScalarValue>
fn ensure_decimal_compatibility(...) -> Result<()>
- Replace macro match arms with explicit
match (lhs, rhs) in one or more functions.
- Keep current compacting and cloning behavior where relevant.
- Preserve current dictionary rewrap behavior when key types match.
- Remove
min_max_scalar_impl! after parity is proven.
Testing Plan
- Extend unit tests in
datafusion/functions-aggregate-common/src/min_max.rs with table-driven parity checks over representative scalar pairs.
- Keep and validate dictionary-focused tests:
- dictionary vs scalar compare-by-inner-value
- dictionary vs dictionary same key type rewrap
- dictionary key type mismatch error
- dictionary vs incompatible scalar error
- Add targeted tests for decimal precision/scale mismatch and fixed-size-binary size mismatch.
- Run crate-scoped tests:
cargo test -p datafusion-functions-aggregate-common min_max
Acceptance Criteria
min_max_scalar_impl! is removed.
min_max_scalar behavior is unchanged for all existing test cases.
- New tests cover control-flow edges previously hidden in macro expansion.
- No regressions in aggregate min/max tests that depend on scalar comparison.
Summary
min_max_scalarindatafusion/functions-aggregate-common/src/min_max.rswas extracted to make error-path control flow testable, but the core scalar dispatch is still encoded in the largemin_max_scalar_impl!macro. This macro packs type matching, validation, recursion, and return-flow behavior into non-local expansion paths that are harder to step through, test in isolation, and evolve safely.This issue proposes replacing macro-heavy scalar dispatch with a function-first implementation that preserves current semantics and error messages while improving testability and maintainability.
Current State
min_max_scalar_impl!with many match arms.min_max_scalarcurrently wraps macro invocation and mapsOrderingtominormaxsemantics.Problem
Goals
Non-Goals
Proposed Approach
fn min_max_scalar(lhs: &ScalarValue, rhs: &ScalarValue, ordering: Ordering) -> Result<ScalarValue>remains the entrypoint.fn min_max_scalar_same_variant(...) -> Result<ScalarValue>fn min_max_dictionary_scalar(...) -> Result<ScalarValue>fn ensure_decimal_compatibility(...) -> Result<()>match (lhs, rhs)in one or more functions.min_max_scalar_impl!after parity is proven.Testing Plan
datafusion/functions-aggregate-common/src/min_max.rswith table-driven parity checks over representative scalar pairs.cargo test -p datafusion-functions-aggregate-common min_maxAcceptance Criteria
min_max_scalar_impl!is removed.min_max_scalarbehavior is unchanged for all existing test cases.