Skip to content

Refactor scalar min/max dispatch from macro arms to function-based logic #22061

@kosiew

Description

@kosiew

Summary

min_max_scalar in datafusion/functions-aggregate-common/src/min_max.rs was extracted to make error-path control flow testable, but the core scalar dispatch is still encoded in the large min_max_scalar_impl! macro. This macro packs type matching, validation, recursion, and return-flow behavior into non-local expansion paths that are harder to step through, test in isolation, and evolve safely.

This issue proposes replacing macro-heavy scalar dispatch with a function-first implementation that preserves current semantics and error messages while improving testability and maintainability.

Current State

  • Scalar dispatch is implemented in min_max_scalar_impl! with many match arms.
  • Dictionary recursion and dictionary-key-type invariants are embedded inside macro expansion.
  • Decimal compatibility checks and some incompatible-type errors are spread across macro arms.
  • min_max_scalar currently wraps macro invocation and maps Ordering to min or max semantics.

Problem

  • Macro expansion hides control flow and increases debugging friction.
  • It is difficult to unit test small dispatch decisions without exercising large macro branches.
  • Future behavior changes risk reintroducing non-local control-flow mistakes.
  • Compiler errors and review diffs are harder to reason about in a large macro body.

Goals

  • Move scalar dispatch to regular Rust functions with explicit control flow.
  • Preserve all externally visible behavior:
    • Same min/max results for all currently supported scalar variants.
    • Same dictionary semantics and key-type mismatch invariants.
    • Same decimal precision/scale checks.
    • Same NaN behavior.
    • Same error class and equivalent error text for incompatible inputs.
  • Keep the change crate-scoped and reviewable.

Non-Goals

  • No dictionary batch optimization work in this issue.
  • No semantic changes to aggregate min/max behavior.
  • No broad refactor of batch kernels outside scalar dispatch.

Proposed Approach

  1. Introduce function-based dispatch helpers:
    • fn min_max_scalar(lhs: &ScalarValue, rhs: &ScalarValue, ordering: Ordering) -> Result<ScalarValue> remains the entrypoint.
    • Add internal helpers to isolate concerns, such as:
      • fn min_max_scalar_same_variant(...) -> Result<ScalarValue>
      • fn min_max_dictionary_scalar(...) -> Result<ScalarValue>
      • fn ensure_decimal_compatibility(...) -> Result<()>
  2. Replace macro match arms with explicit match (lhs, rhs) in one or more functions.
  3. Keep current compacting and cloning behavior where relevant.
  4. Preserve current dictionary rewrap behavior when key types match.
  5. Remove min_max_scalar_impl! after parity is proven.

Testing Plan

  • Extend unit tests in datafusion/functions-aggregate-common/src/min_max.rs with table-driven parity checks over representative scalar pairs.
  • Keep and validate dictionary-focused tests:
    • dictionary vs scalar compare-by-inner-value
    • dictionary vs dictionary same key type rewrap
    • dictionary key type mismatch error
    • dictionary vs incompatible scalar error
  • Add targeted tests for decimal precision/scale mismatch and fixed-size-binary size mismatch.
  • Run crate-scoped tests:
    • cargo test -p datafusion-functions-aggregate-common min_max

Acceptance Criteria

  • min_max_scalar_impl! is removed.
  • min_max_scalar behavior is unchanged for all existing test cases.
  • New tests cover control-flow edges previously hidden in macro expansion.
  • No regressions in aggregate min/max tests that depend on scalar comparison.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions