Skip to content

fix(expr): correct CanContainNulls/CanContainNaNs when field missing from stats map#686

Open
sentomk wants to merge 2 commits into
apache:mainfrom
sentomk:fix/strict-metrics-evaluator
Open

fix(expr): correct CanContainNulls/CanContainNaNs when field missing from stats map#686
sentomk wants to merge 2 commits into
apache:mainfrom
sentomk:fix/strict-metrics-evaluator

Conversation

@sentomk
Copy link
Copy Markdown

@sentomk sentomk commented May 27, 2026

Summary

  • Fix StrictMetricsEvaluator::CanContainNulls to return true (conservative) when a field is absent from a non-empty null_value_counts map, and false for required fields per schema
  • Fix StrictMetricsEvaluator::CanContainNaNs to return true (conservative) when a field is absent from a non-empty nan_value_counts map, and false for non-floating-point types
  • Add regression tests covering the missing-entry scenario

Close #685

Motivation

When null_value_counts or nan_value_counts is non-empty but does not contain an entry for the queried field, the evaluator incorrectly returned false. This caused comparison operators to erroneously return kRowsMustMatch, potentially skipping row-level filtering and returning rows that do not satisfy the predicate.

Test plan

  • All 40 existing + new StrictMetrics tests pass
  • pre-commit (clang-format, trailing whitespace, etc.) passes

…from stats map

When null_value_counts or nan_value_counts is non-empty but does not
contain an entry for the queried field, the evaluator incorrectly
returned false (cannot contain nulls/NaNs). This caused comparison
operators to erroneously return kRowsMustMatch, skipping row-level
filtering for rows that may not satisfy the predicate.

Fix CanContainNulls to:
- Return false for required (non-optional) fields per schema
- Return true (conservative) when field is absent from a non-empty map

Fix CanContainNaNs to:
- Return false for non-floating-point types
- Return true (conservative) when field is absent from a non-empty map

This aligns with the Java StrictMetricsEvaluator behavior.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes StrictMetricsEvaluator’s handling of missing per-field entries in null_value_counts / nan_value_counts, making the evaluator conservative (i.e., assume nulls/NaNs may exist) when the stats maps are non-empty but omit the queried field, preventing incorrect kRowsMustMatch results that could skip row-level filtering.

Changes:

  • Update CanContainNulls to return true when a field is absent from a non-empty null_value_counts map, and false for required schema fields.
  • Update CanContainNaNs to return true when a field is absent from a non-empty nan_value_counts map, and false for non-floating-point fields.
  • Add regression tests for missing-entry scenarios (with one test needing adjustment to actually exercise the bug condition).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/iceberg/expression/strict_metrics_evaluator.cc Makes null/NaN containment checks conservative when stats maps omit a field, and adds required/non-floating fast paths.
src/iceberg/test/strict_metrics_evaluator_test.cc Adds regression coverage for missing null/NaN count entries (one test currently doesn’t set bounds, so it won’t regress the original issue).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +849 to +861
TEST_F(StrictMetricsEvaluatorMigratedTest, MissingNullCountForField) {
// Field 14 (no_nan_stats, float64, optional) has bounds and value_counts but is
// missing from null_value_counts. The evaluator must conservatively assume nulls
// may exist and return kRowsMightNotMatch for comparison operators.
ExpectShouldRead(Expressions::LessThan("no_nan_stats", Literal::Double(200.0)), false);
ExpectShouldRead(Expressions::LessThanOrEqual("no_nan_stats", Literal::Double(200.0)),
false);
ExpectShouldRead(Expressions::GreaterThan("no_nan_stats", Literal::Double(-1.0)),
false);
ExpectShouldRead(Expressions::GreaterThanOrEqual("no_nan_stats", Literal::Double(-1.0)),
false);
ExpectShouldRead(Expressions::Equal("no_nan_stats", Literal::Double(50.0)), false);
}
@sentomk sentomk force-pushed the fix/strict-metrics-evaluator branch from f20015a to e7344ea Compare May 27, 2026 17:10
Cover the scenario where a field is absent from null_value_counts or
nan_value_counts while the map is non-empty. Verify that comparison
operators conservatively return kRowsMightNotMatch instead of
incorrectly claiming kRowsMustMatch.
@sentomk sentomk force-pushed the fix/strict-metrics-evaluator branch from e7344ea to 68c68b3 Compare May 27, 2026 17:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: StrictMetricsEvaluator returns incorrect result when null/NaN counts are missing for a field.

2 participants