[BugFix] Key date_time pushdown on field type, not literal UDT (#5481)#5515
[BugFix] Key date_time pushdown on field type, not literal UDT (#5481)#5515RyanL1997 wants to merge 4 commits into
Conversation
…earch-project#5481) When a timestamp comparison is AND'd with a SARG-eligible sibling (e.g. severityText IN ('ERROR','WARN') or a chained OR of equalities), Calcite's RexSimplify folds the sibling into a Sarg and re-types the timestamp literal — DATE_SUB(NOW(), INTERVAL 5 MINUTE) loses its EXPR_TIMESTAMP UDT and arrives as VARCHAR. That break the DSL emission downstream: - LiteralExpression.value() takes the isString() branch and returns the raw RexLiteral.stringValue() — the space-separated "2026-05-28 16:18:43" form, never normalized to ISO-8601. - addFormatIfNecessary skips .format("date_time") because literal.isDateTime() reads false. Both checks read the literal's surviving UDT, which is unreliable after RexSimplify. The shard receives "2026-05-28 16:18:43" against the default strict_date_optional_time||epoch_millis parser and returns 500. Fix the five non-Sarg comparison paths in SimpleQueryExpression (equals, notEquals, gt, gte, lt, lte) to additionally check rel.isTimeStampType() — the same defensive pattern the Sarg path in constructQueryExpressionForSearch already uses. When the field is a timestamp, the value is routed through timestampValueForPushDown to canonicalize to ISO-8601 regardless of the literal's surviving type, and the range query always carries format("date_time"). The non-timestamp-field branch is unchanged — termQuery / boolQuery behavior for keyword/text/numeric fields is preserved exactly. Verified end-to-end on a live node: - Before: range emits "2026-06-04 16:53:50" with no format attr → shard rejects the parse, query returns HTTP 500. - After: range emits "2026-06-04T17:14:01.000Z" with format "date_time" → query returns the expected ERROR + WARN rows. Resolves opensearch-project#5481 Signed-off-by: Jialiang Liang <jiallian@amazon.com>
PR Reviewer Guide 🔍(Review updated until commit 311d38f)Here are some key observations to aid the review process:
|
Cover the case where RexSimplify strips a literal's EXPR_TIMESTAMP UDT
during Sarg folding and the field is the only timestamp anchor left.
Without the fix, the raw "yyyy-MM-dd HH:mm:ss" string would ship without
format("date_time"), and the shard's default parser would reject it.
Trim the two helper-method JavaDocs to single-line `//` comments now that
the test pins the behavior.
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
|
Persistent review updated to latest commit 1c8b5df |
PR Code Suggestions ✨Latest suggestions up to 311d38f Explore these optional code suggestions:
Previous suggestionsSuggestions up to commit 1dfe1b7
Suggestions up to commit 1c8b5df
|
The new gt_normalizesVarcharLiteralAgainstTimestampField signature should sit on one line per Google Java Format, fixing the spotlessJavaCheck violation on the macOS unit jobs. Signed-off-by: Jialiang Liang <jiallian@amazon.com>
|
Persistent review updated to latest commit 1dfe1b7 |
| private Object endpointValue(LiteralExpression literal, boolean isTimeStamp) { | ||
| if (!isTimeStamp || literal.isDateTime()) { | ||
| return literal.value(); | ||
| } | ||
| return timestampValueForPushDown(literal.value().toString()); | ||
| } |
There was a problem hiding this comment.
Can reuse LiteralExperssion.value() method?
| RangeQueryBuilder rq = rangeQuery(getFieldReference()).gte(value); | ||
| if (isTimeStamp) rq.format("date_time"); | ||
| builder = rq; |
There was a problem hiding this comment.
The same code duplicated in lt(), lte(). Can reuse addFormatIfNecessary?
| * field's type so the shard's default date parser accepts the value. | ||
| */ | ||
| @Test | ||
| void gt_normalizesVarcharLiteralAgainstTimestampField() throws ExpressionNotAnalyzableException { |
There was a problem hiding this comment.
Besides this UT, could you add a simple IT test to verify end-to-end?
Refactor the opensearch-project#5481 fix per review: - Reuse the existing addFormatIfNecessary helper (re-keyed on the resolved isTimeStamp boolean) instead of inlining format("date_time") across the six comparison methods; the helper was orphaned by the prior commit. - Reuse LiteralExpression.value() + the existing Sarg-path convertEndpointValue for endpoint normalization, removing the parallel endpointValue helper. - Guard convertEndpointValue against a null endpoint (shared entry point). Tests: - Add stripped-VARCHAR unit cases for equals/notEquals/lte (gt already covered) so every range shape pins the field-type fallback, not just gt. - Add end-to-end IT CalcitePPLBasicIT.testTimestampRangeWithInClausePushDown (timestamp range AND keyword IN), confirmed to reproduce the shard date-parse error when the fix is reverted. - Add yamlRest test issues/5481.yml for the same shape. Signed-off-by: Jialiang Liang <jiallian@amazon.com>
|
Persistent review updated to latest commit 311d38f |
Description
When a timestamp comparison is AND'd with a SARG-eligible sibling (
IN, chainedORof equalities), Calcite'sRexSimplifyfolds the sibling into aSargand re-types the timestamp literal during simplification.DATE_SUB(NOW(), INTERVAL 5 MINUTE)loses itsEXPR_TIMESTAMPUDT and arrives at the predicate analyzer asVARCHAR.That breaks the DSL emission downstream in
SimpleQueryExpression:LiteralExpression.value()takes theisString()branch and returns the rawRexLiteral.stringValue()— the space-separated"2026-05-28 16:18:43"form, never normalized to ISO-8601.addFormatIfNecessaryskips.format("date_time")becauseliteral.isDateTime()readsfalse.Both checks key off the literal's surviving UDT, which is unreliable after
RexSimplify. The shard receives"2026-05-28 16:18:43"against the defaultstrict_date_optional_time||epoch_millisparser and returns HTTP 500.Fix
The five non-Sarg comparison paths in
SimpleQueryExpression(equals,notEquals,gt,gte,lt,lte) now key offrel.isTimeStampType()— the field's type viaNamedFieldExpression. This is the same defensive pattern the Sarg path inconstructQueryExpressionForSearch(PredicateAnalyzer.java:794-796) already uses.When the field is a timestamp:
timestampValueForPushDownto canonicalize to ISO-8601 regardless of the literal's surviving type.format("date_time").The non-timestamp-field branch is unchanged —
termQuery/boolQuerybehavior for keyword/text/numeric fields is preserved exactly.Verification
End-to-end on a live local node with the index template and sample data from the issue:
Before:
After:
:opensearch:testpasses with no changes to existing assertions; the existingdate_timeformat assertions inPredicateAnalyzerTest.javacontinue to hold for the literal-typed-correctly path.Related Issues
Resolves #5481
Check List
--signoffor-s.