Skip to content

[BugFix] Key date_time pushdown on field type, not literal UDT (#5481)#5515

Open
RyanL1997 wants to merge 4 commits into
opensearch-project:mainfrom
RyanL1997:worktree-timestamp-format-pushdown-5481
Open

[BugFix] Key date_time pushdown on field type, not literal UDT (#5481)#5515
RyanL1997 wants to merge 4 commits into
opensearch-project:mainfrom
RyanL1997:worktree-timestamp-format-pushdown-5481

Conversation

@RyanL1997
Copy link
Copy Markdown
Collaborator

@RyanL1997 RyanL1997 commented Jun 4, 2026

Description

When a timestamp comparison is AND'd with a SARG-eligible sibling (IN, chained OR of equalities), Calcite's RexSimplify folds the sibling into a Sarg and re-types the timestamp literal during simplification. DATE_SUB(NOW(), INTERVAL 5 MINUTE) loses its EXPR_TIMESTAMP UDT and arrives at the predicate analyzer as VARCHAR.

That breaks the DSL emission downstream in SimpleQueryExpression:

  1. LiteralExpression.value() takes the isString() branch and returns the raw RexLiteral.stringValue() — the space-separated "2026-05-28 16:18:43" form, never normalized to ISO-8601.
  2. addFormatIfNecessary skips .format("date_time") because literal.isDateTime() reads false.

Both checks key off the literal's surviving UDT, which is unreliable after RexSimplify. The shard receives "2026-05-28 16:18:43" against the default strict_date_optional_time||epoch_millis parser and returns HTTP 500.

Fix

The five non-Sarg comparison paths in SimpleQueryExpression (equals, notEquals, gt, gte, lt, lte) now key off rel.isTimeStampType() — the field's type via NamedFieldExpression. This is the same defensive pattern the Sarg path in constructQueryExpressionForSearch (PredicateAnalyzer.java:794-796) already uses.

When the field is a timestamp:

  • The endpoint value is routed through timestampValueForPushDown to canonicalize to ISO-8601 regardless of the literal's surviving type.
  • The range query always carries format("date_time").

The non-timestamp-field branch is unchanged — termQuery / boolQuery behavior for keyword/text/numeric fields is preserved exactly.

Verification

End-to-end on a live local node with the index template and sample data from the issue:

Before:

"range":{"@timestamp":{"from":"2026-06-04 16:53:50",...}}    // no format, raw string
→ failed to parse date field [2026-06-04 16:53:50] with format [strict_date_optional_time||epoch_millis]
→ HTTP 500

After:

"range":{"@timestamp":{"from":"2026-06-04T17:14:01.000Z",..., "format":"date_time", ...}}
→ STATUS: OK — 2 rows returned (ERROR + WARN; INFO filtered out)

:opensearch:test passes with no changes to existing assertions; the existing date_time format assertions in PredicateAnalyzerTest.java continue to hold for the literal-typed-correctly path.

Related Issues

Resolves #5481

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

…earch-project#5481)

When a timestamp comparison is AND'd with a SARG-eligible sibling (e.g.
severityText IN ('ERROR','WARN') or a chained OR of equalities), Calcite's
RexSimplify folds the sibling into a Sarg and re-types the timestamp
literal — DATE_SUB(NOW(), INTERVAL 5 MINUTE) loses its EXPR_TIMESTAMP UDT
and arrives as VARCHAR.

That break the DSL emission downstream:
- LiteralExpression.value() takes the isString() branch and returns the
  raw RexLiteral.stringValue() — the space-separated "2026-05-28 16:18:43"
  form, never normalized to ISO-8601.
- addFormatIfNecessary skips .format("date_time") because
  literal.isDateTime() reads false.

Both checks read the literal's surviving UDT, which is unreliable after
RexSimplify. The shard receives "2026-05-28 16:18:43" against the default
strict_date_optional_time||epoch_millis parser and returns 500.

Fix the five non-Sarg comparison paths in SimpleQueryExpression (equals,
notEquals, gt, gte, lt, lte) to additionally check rel.isTimeStampType()
— the same defensive pattern the Sarg path in
constructQueryExpressionForSearch already uses. When the field is a
timestamp, the value is routed through timestampValueForPushDown to
canonicalize to ISO-8601 regardless of the literal's surviving type, and
the range query always carries format("date_time").

The non-timestamp-field branch is unchanged — termQuery / boolQuery
behavior for keyword/text/numeric fields is preserved exactly.

Verified end-to-end on a live node:
- Before: range emits "2026-06-04 16:53:50" with no format attr → shard
  rejects the parse, query returns HTTP 500.
- After:  range emits "2026-06-04T17:14:01.000Z" with format "date_time"
  → query returns the expected ERROR + WARN rows.

Resolves opensearch-project#5481

Signed-off-by: Jialiang Liang <jiallian@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

PR Reviewer Guide 🔍

(Review updated until commit 311d38f)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ No major issues detected

Cover the case where RexSimplify strips a literal's EXPR_TIMESTAMP UDT
during Sarg folding and the field is the only timestamp anchor left.
Without the fix, the raw "yyyy-MM-dd HH:mm:ss" string would ship without
format("date_time"), and the shard's default parser would reject it.

Trim the two helper-method JavaDocs to single-line `//` comments now that
the test pins the behavior.

Signed-off-by: Jialiang Liang <jiallian@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

Persistent review updated to latest commit 1c8b5df

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

PR Code Suggestions ✨

Latest suggestions up to 311d38f

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Handle null endpoint value explicitly

The null check prevents NPE but may produce incorrect query semantics. When
sargPointValue(value) returns null for a timestamp comparison, returning null could
generate a malformed range query. Consider throwing an exception or logging a
warning to make the failure explicit rather than silently propagating null.

opensearch/src/main/java/org/opensearch/sql/opensearch/request/PredicateAnalyzer.java [1603-1611]

 private Object convertEndpointValue(Object value, boolean isTimeStamp) {
   value = sargPointValue(value);
-  // Shared normalization entry point: a Sarg endpoint can normalize to null, and
-  // timestampValueForPushDown(value.toString()) would otherwise NPE.
   if (value == null) {
-    return null;
+    throw new IllegalArgumentException("Sarg endpoint normalized to null, cannot build range query");
   }
   return isTimeStamp ? timestampValueForPushDown(value.toString()) : value;
 }
Suggestion importance[1-10]: 3

__

Why: While the suggestion raises a valid concern about null handling, the current implementation's approach of returning null may be intentional for handling edge cases in query construction. The suggestion to throw an exception could be too aggressive without understanding the full context of how null values are handled downstream. The improvement is marginal and the impact is uncertain.

Low

Previous suggestions

Suggestions up to commit 1dfe1b7
CategorySuggestion                                                                                                                                    Impact
Possible issue
Prevent potential NullPointerException

The method calls literal.value().toString() without null-checking the result of
literal.value(). If literal.value() returns null, this will throw a
NullPointerException. Add a null check before calling toString().

opensearch/src/main/java/org/opensearch/sql/opensearch/request/PredicateAnalyzer.java [1456-1461]

 private Object endpointValue(LiteralExpression literal, boolean isTimeStamp) {
   if (!isTimeStamp || literal.isDateTime()) {
     return literal.value();
   }
-  return timestampValueForPushDown(literal.value().toString());
+  Object value = literal.value();
+  return timestampValueForPushDown(value != null ? value.toString() : null);
 }
Suggestion importance[1-10]: 7

__

Why: The suggestion addresses a potential NullPointerException when literal.value() returns null. This is a valid defensive programming practice that could prevent runtime errors, though the likelihood depends on whether literal.value() can actually return null in practice.

Medium
General
Add braces for conditional statement

The format() method call on RangeQueryBuilder returns the builder instance for
chaining, but the return value is not captured. While this works due to mutation,
it's inconsistent with the pattern used in equals() and notEquals() methods where
chaining is explicit.

opensearch/src/main/java/org/opensearch/sql/opensearch/request/PredicateAnalyzer.java [1414-1416]

 RangeQueryBuilder rq = rangeQuery(getFieldReference()).gt(value);
-if (isTimeStamp) rq.format("date_time");
+if (isTimeStamp) {
+  rq.format("date_time");
+}
 builder = rq;
Suggestion importance[1-10]: 4

__

Why: Adding braces improves code readability and follows Java best practices for single-line conditionals. However, the existing code is functionally correct and the impact is minimal, primarily affecting code style rather than functionality.

Low
Suggestions up to commit 1c8b5df
CategorySuggestion                                                                                                                                    Impact
Possible issue
Add null safety check

The method calls literal.value().toString() without null checking. If
literal.value() returns null, this will throw a NullPointerException. Add a null
check before calling toString() to prevent potential runtime errors.

opensearch/src/main/java/org/opensearch/sql/opensearch/request/PredicateAnalyzer.java [1456-1461]

 private Object endpointValue(LiteralExpression literal, boolean isTimeStamp) {
   if (!isTimeStamp || literal.isDateTime()) {
     return literal.value();
   }
-  return timestampValueForPushDown(literal.value().toString());
+  Object value = literal.value();
+  return value != null ? timestampValueForPushDown(value.toString()) : null;
 }
Suggestion importance[1-10]: 7

__

Why: Valid concern about potential NullPointerException when calling toString() on a potentially null literal.value(). Adding null safety improves robustness, though the actual likelihood of null values depends on the upstream code context.

Medium
General
Simplify builder assignment pattern

The format() method call on RangeQueryBuilder returns the builder instance for
chaining, but the return value is not captured. While this works, it's inconsistent
with the pattern used in equals() and notEquals() methods where the format is
applied inline.

opensearch/src/main/java/org/opensearch/sql/opensearch/request/PredicateAnalyzer.java [1414-1416]

-RangeQueryBuilder rq = rangeQuery(getFieldReference()).gt(value);
-if (isTimeStamp) rq.format("date_time");
-builder = rq;
+builder = rangeQuery(getFieldReference()).gt(value);
+if (isTimeStamp) builder.format("date_time");
Suggestion importance[1-10]: 3

__

Why: While the suggestion is technically correct, the current pattern is valid and functional. The RangeQueryBuilder intermediate variable approach is clear and doesn't introduce any issues. The suggested change offers minimal improvement in code quality.

Low

The new gt_normalizesVarcharLiteralAgainstTimestampField signature
should sit on one line per Google Java Format, fixing the
spotlessJavaCheck violation on the macOS unit jobs.

Signed-off-by: Jialiang Liang <jiallian@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

Persistent review updated to latest commit 1dfe1b7

Copy link
Copy Markdown
Collaborator

@penghuo penghuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add yamlRestIT

Comment on lines +1456 to +1461
private Object endpointValue(LiteralExpression literal, boolean isTimeStamp) {
if (!isTimeStamp || literal.isDateTime()) {
return literal.value();
}
return timestampValueForPushDown(literal.value().toString());
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can reuse LiteralExperssion.value() method?

Comment on lines +1424 to +1426
RangeQueryBuilder rq = rangeQuery(getFieldReference()).gte(value);
if (isTimeStamp) rq.format("date_time");
builder = rq;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same code duplicated in lt(), lte(). Can reuse addFormatIfNecessary?

* field's type so the shard's default date parser accepts the value.
*/
@Test
void gt_normalizesVarcharLiteralAgainstTimestampField() throws ExpressionNotAnalyzableException {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides this UT, could you add a simple IT test to verify end-to-end?

Refactor the opensearch-project#5481 fix per review:
- Reuse the existing addFormatIfNecessary helper (re-keyed on the resolved
  isTimeStamp boolean) instead of inlining format("date_time") across the six
  comparison methods; the helper was orphaned by the prior commit.
- Reuse LiteralExpression.value() + the existing Sarg-path convertEndpointValue
  for endpoint normalization, removing the parallel endpointValue helper.
- Guard convertEndpointValue against a null endpoint (shared entry point).

Tests:
- Add stripped-VARCHAR unit cases for equals/notEquals/lte (gt already covered)
  so every range shape pins the field-type fallback, not just gt.
- Add end-to-end IT CalcitePPLBasicIT.testTimestampRangeWithInClausePushDown
  (timestamp range AND keyword IN), confirmed to reproduce the shard date-parse
  error when the fix is reverted.
- Add yamlRest test issues/5481.yml for the same shape.

Signed-off-by: Jialiang Liang <jiallian@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 8, 2026

Persistent review updated to latest commit 311d38f

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Timestamp range pushdown emits unparseable date format when AND'd with IN/OR clause

3 participants