feat: fix NTILE distribution logic by comphead · Pull Request #22051 · apache/datafusion

comphead · 2026-05-06T20:04:05Z

Which issue does this PR close?

Closes NTILE returns wrong results #22049 .

Rationale for this change

Root cause — datafusion/functions-window/src/ntile.rs:170-176 used the linear-interpolation formula i * n / num_rows to assign bucket numbers. That formula spreads the larger buckets evenly through the partition, but the SQL standard requires the first num_rows mod n buckets to be the larger ones. For
NTILE(4) over 10 rows it produced bucket sizes 3,2,3,2 instead of 3,3,2,2

Fix — replaced the formula with the front-loaded distribution: compute base = num_rows / n and remainder = num_rows % n, treat the first remainder * (base+1) rows as the "large bucket" region (each bucket of size base+1), and the remainder as size-base buckets. The n > num_rows case (existing test
NTILE(9223377) over 3 rows) still works because base = 0 forces every row into the "large" arm where each bucket holds one row.

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

comphead · 2026-05-06T20:12:51Z

The entire CI is super fast now after #21941 thanks @blaginin

coderfender

Minor comments

coderfender · 2026-05-06T23:52:28Z

+            } else {
+                // base > 0 here: i >= large_rows is only reachable when remainder < n,
+                // which forces base >= 1 (otherwise large_rows would equal num_rows).
+                remainder + (i - large_rows) / base + 1


Thank you ! perhaps a good idea to add tests for remainder = 0 ? Like say 10 rows distributed by 5

coderfender · 2026-05-06T23:52:49Z

        let num_rows = num_rows as u64;
-        let mut vec: Vec<u64> = Vec::new();
-        let n = u64::min(self.n, num_rows);
+        let n = self.n;


Should we also reject NTILE(0) directly ?

mbutrovich

Minor issue exposed by the new code. Thanks @comphead!

mbutrovich · 2026-05-06T23:55:05Z

I think we need a guard for the unsigned path to match the signed path below for n == 0. The new code will panic if n is 0 in the unsigned case.

mbutrovich · 2026-05-06T23:55:42Z

Good thinking, @coderfender :)

comphead · 2026-05-07T00:05:05Z

Thanks @mbutrovich @coderfender non valid inputs, like 0, negative and NULL are covered in NTILE partition_evaluator (

datafusion/datafusion/functions-window/src/ntile.rs

Line 119 in 2a14a93

fn partition_evaluator(

) which called through create_udwf_window_expr which DF and also Comet calling.

But I'll add tests to assert it

UPD: double checked, those tests already in place and passing.

mbutrovich · 2026-05-07T15:01:42Z

Thanks @mbutrovich @coderfender non valid inputs, like 0, negative and NULL are covered in NTILE partition_evaluator (

datafusion/datafusion/functions-window/src/ntile.rs

Line 119 in 2a14a93

fn partition_evaluator(

) which called through create_udwf_window_expr which DF and also Comet calling.
But I'll add tests to assert it

UPD: double checked, those tests already in place and passing.

So you're saying

            if n <= 0 {
                return exec_err!("NTILE requires a positive integer");
            }

is dead code then? We should remove it if it's not possible and replace with a debug_assert to make the invariant enforced in testing and invisible in release builds.

comphead · 2026-05-07T15:50:20Z

is dead code then? We should remove it if it's not possible and replace with a debug_assert to make the invariant enforced in testing and invisible in release builds.

The evaluator can be simplified, however it is the existing code, thanks for helping to make it cleaner. Addressed

mbutrovich

Approved pending CI, thanks @comphead!

feat: fix NTILE distribution logic

74d58a5

github-actions Bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels May 6, 2026

coderfender approved these changes May 6, 2026

View reviewed changes

mbutrovich requested changes May 6, 2026

View reviewed changes

comphead requested a review from mbutrovich May 7, 2026 00:54

comphead enabled auto-merge May 7, 2026 14:55

feat: fix NTILE distribution logic

b656bf2

mbutrovich approved these changes May 7, 2026

View reviewed changes

comphead added this pull request to the merge queue May 7, 2026

Merged via the queue into apache:main with commit 2bf1db5 May 7, 2026
35 checks passed

comphead deleted the ntile branch May 7, 2026 16:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: fix NTILE distribution logic#22051

feat: fix NTILE distribution logic#22051
comphead merged 2 commits intoapache:mainfrom
comphead:ntile

comphead commented May 6, 2026 •

edited

Loading

Uh oh!

comphead commented May 6, 2026

Uh oh!

coderfender left a comment

Uh oh!

coderfender May 6, 2026

Uh oh!

coderfender May 6, 2026

Uh oh!

mbutrovich left a comment

Uh oh!

mbutrovich May 6, 2026

Uh oh!

mbutrovich commented May 6, 2026

Uh oh!

comphead commented May 7, 2026 •

edited

Loading

Uh oh!

mbutrovich commented May 7, 2026

Uh oh!

comphead commented May 7, 2026

Uh oh!

mbutrovich left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

comphead commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

comphead commented May 6, 2026

Uh oh!

coderfender left a comment

Choose a reason for hiding this comment

Uh oh!

coderfender May 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderfender May 6, 2026

Choose a reason for hiding this comment

Uh oh!

mbutrovich left a comment

Choose a reason for hiding this comment

Uh oh!

mbutrovich May 6, 2026

Choose a reason for hiding this comment

Uh oh!

mbutrovich commented May 6, 2026

Uh oh!

comphead commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mbutrovich commented May 7, 2026

Uh oh!

comphead commented May 7, 2026

Uh oh!

mbutrovich left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

comphead commented May 6, 2026 •

edited

Loading

comphead commented May 7, 2026 •

edited

Loading