[SPARK-57660][SQL] Support casting between TIME(p) and TIMESTAMP_LTZ(q)#56733
Open
MaxGekk wants to merge 1 commit into
Open
[SPARK-57660][SQL] Support casting between TIME(p) and TIMESTAMP_LTZ(q)#56733MaxGekk wants to merge 1 commit into
MaxGekk wants to merge 1 commit into
Conversation
### What changes were proposed in this pull request? This PR adds bidirectional casts between the `TIME(p)` data type (`p` in `[0, 9]`) and `TIMESTAMP_LTZ(q)` (`q` in `[6, 9]`, where `q=6` is the microsecond `TimestampType` and `q` in `[7, 9]` is the nanosecond `TimestampLTZNanosType`). It is the `TIMESTAMP_LTZ` counterpart of SPARK-57618 (`TIME` <-> `TIMESTAMP_NTZ`) and a sub-task of SPARK-56822. Semantics follow the SQL standard (section 6.13 `<cast specification>`): - `CAST(TIMESTAMP_LTZ(q) AS TIME(p))` (rule 15.d): the LTZ value is an absolute instant, so its time-of-day is the local wall-clock time observed in the session time zone, truncated to precision `p`. Unlike `TIMESTAMP_NTZ -> TIME`, this direction depends on the session time zone. - `CAST(TIME(p) AS TIMESTAMP_LTZ(q))` (rule 17.c): the date fields come from `CURRENT_DATE` and the time fields from the value; the resulting local date-time is interpreted in the session time zone to produce the instant. Since `CURRENT_DATE` is stable within a query, the cast is stabilized via the existing `ComputeCurrentTime` optimizer rule, so it shares the same date literal as `current_date()`. Both directions of `TIME` <-> `TIMESTAMP_LTZ` therefore depend on the session time zone (whereas for `TIMESTAMP_NTZ` only `TIME -> TIMESTAMP_NTZ` does). Fractional precision handling is pure truncation (floor toward the precision step; no rounding). Both directions always succeed, so no new nullability or error condition is introduced. Implementation notes: - New rule-table entries in `Cast.canCast` / `Cast.canAnsiCast` for the four pairs. `canTryCast` inherits these for atomic types. - All four pairs are marked `needsTimeZone` (both directions read the session zone). - Interpreted and codegen paths for both directions. - `ComputeCurrentTime` scans `CAST` nodes and, applying the new `Cast.isTimeToTimestampLTZ` predicate on the resolved plan, rewrites `TIME -> TIMESTAMP_LTZ` into a zone-aware date+time builder (new internal `MakeTimestampLTZ` / `MakeTimestampLTZNanos`) anchored on the query current date. As with the NTZ feature, these casts are intentionally not tagged with `CURRENT_LIKE` (inline-table validation treats `CURRENT_LIKE` as safe to defer). The `Cast` eval/codegen fallback (using `currentDate(zoneId)`) covers direct expression evaluation. - New helpers: `SparkDateTimeUtils.timestampToNanosOfDay` / `timestampLTZNanosToNanosOfDay` and `DateTimeUtils.makeTimestampLTZNanos`. Out of scope: Structured Streaming batch-timestamp parity for `TIME -> TIMESTAMP_LTZ` (the cast uses the optimizer-instant current date rather than the micro-batch timestamp). ### Why are the changes needed? Spark supports `TIME` <-> `TIMESTAMP_NTZ` casts (SPARK-57618) but not `TIME` <-> `TIMESTAMP_LTZ`. These conversions are required by the SQL standard and are a common user need (attaching a time-of-day to a timestamp, or extracting the time-of-day from a timestamp). This is a sub-task of SPARK-56822 (timestamps with nanosecond precision). ### Does this PR introduce _any_ user-facing change? Yes. Casting between `TIME(p)` and `TIMESTAMP_LTZ(q)` is now allowed (previously it failed analysis with a cast type-mismatch). Examples: ```sql -- extract the time-of-day in the session time zone SELECT CAST(TIMESTAMP'2020-05-17 12:34:56.789012' AS TIME(6)); -- 12:34:56.789012 -- attach the current date, interpreted in the session time zone SELECT CAST(TIME'12:34:56.789012345' AS TIMESTAMP_LTZ(9)); -- <current_date> 12:34:56.789012345 ``` This is a new feature on an unreleased branch; there is no behavior change relative to a released version. ### How was this patch tested? - New unit tests in `CastSuiteBase` (run under ANSI on and off): allowed-pair / `needsTimeZone` matrix, `isTimeToTimestampLTZ` truth table, `TIMESTAMP_LTZ(q) -> TIME(p)` values across all precisions (including pre-epoch and sub-microsecond truncation), interpreted-vs-codegen consistency, and zone-fixed round trips. - New tests in `DateExpressionsSuite` for `MakeTimestampLTZ` / `MakeTimestampLTZNanos` (including canonicalization on precision). - New test in `ComputeCurrentTimeSuite` asserting the forward cast is rewritten with a query-stable current-date literal consistent with `current_date()`. - New unit tests in `DateTimeUtilsSuite` for `makeTimestampLTZNanos` and the time-of-day extraction helpers. - New deterministic cases in `cast.sql` (and the imported `nonansi/cast.sql`) with regenerated golden files. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor (Claude Opus 4.8)
Member
Author
|
@cloud-fan @uros-b @stevomitric Could you review this PR, please. It is similar to recently merged PR for TIME(p) <-> TIMESTAMP_NTZ(q). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This PR adds bidirectional casts between the
TIME(p)data type (pin[0, 9]) andTIMESTAMP_LTZ(q)(qin[6, 9], whereq=6is the microsecondTimestampTypeandqin[7, 9]is the nanosecondTimestampLTZNanosType).It is the
TIMESTAMP_LTZcounterpart of #56677 / SPARK-57618 (TIME<->TIMESTAMP_NTZ) and a sub-task of SPARK-56822.Semantics follow the SQL standard (section 6.13
<cast specification>):CAST(TIMESTAMP_LTZ(q) AS TIME(p))(rule 15.d): the LTZ value is an absolute instant, so its time-of-day is the local wall-clock time observed in the session time zone, truncated to precisionp. UnlikeTIMESTAMP_NTZ -> TIME, this direction depends on the session time zone.CAST(TIME(p) AS TIMESTAMP_LTZ(q))(rule 17.c): the date fields come fromCURRENT_DATEand the time fields from the value; the resulting local date-time is interpreted in the session time zone to produce the instant. SinceCURRENT_DATEis stable within a query, the cast is stabilized via the existingComputeCurrentTimeoptimizer rule, so it shares the same date literal ascurrent_date().Both directions of
TIME<->TIMESTAMP_LTZtherefore depend on the session time zone (whereas forTIMESTAMP_NTZonlyTIME -> TIMESTAMP_NTZdoes). Fractional precision handling is pure truncation (floor toward the precision step; no rounding). Both directions always succeed, so no new nullability or error condition is introduced.Implementation notes:
Cast.canCast/Cast.canAnsiCastfor the four pairs.canTryCastinherits these for atomic types.needsTimeZone(both directions read the session zone).ComputeCurrentTimescansCASTnodes and, applying the newCast.isTimeToTimestampLTZpredicate on the resolved plan, rewritesTIME -> TIMESTAMP_LTZinto a zone-aware date+time builder (new internalMakeTimestampLTZ/MakeTimestampLTZNanos) anchored on the query current date. As with the NTZ feature, these casts are intentionally not tagged withCURRENT_LIKE(inline-table validation treatsCURRENT_LIKEas safe to defer). TheCasteval/codegen fallback (usingcurrentDate(zoneId)) covers direct expression evaluation.SparkDateTimeUtils.timestampToNanosOfDay/timestampLTZNanosToNanosOfDayandDateTimeUtils.makeTimestampLTZNanos.Out of scope: Structured Streaming batch-timestamp parity for
TIME -> TIMESTAMP_LTZ(the cast uses the optimizer-instant current date rather than the micro-batch timestamp).Why are the changes needed?
Spark supports
TIME<->TIMESTAMP_NTZcasts (SPARK-57618) but notTIME<->TIMESTAMP_LTZ. These conversions are required by the SQL standard and are a common user need (attaching a time-of-day to a timestamp, or extracting the time-of-day from a timestamp). This is a sub-task of SPARK-56822 (timestamps with nanosecond precision).Does this PR introduce any user-facing change?
Yes. Casting between
TIME(p)andTIMESTAMP_LTZ(q)is now allowed (previously it failed analysis with a cast type-mismatch). Examples:This is a new feature on an unreleased branch; there is no behavior change relative to a released version.
How was this patch tested?
CastSuiteBase(run under ANSI on and off): allowed-pair /needsTimeZonematrix,isTimeToTimestampLTZtruth table,TIMESTAMP_LTZ(q) -> TIME(p)values across all precisions (including pre-epoch and sub-microsecond truncation), interpreted-vs-codegen consistency, and zone-fixed round trips.DateExpressionsSuiteforMakeTimestampLTZ/MakeTimestampLTZNanos(including canonicalization on precision).ComputeCurrentTimeSuiteasserting the forward cast is rewritten with a query-stable current-date literal consistent withcurrent_date().DateTimeUtilsSuiteformakeTimestampLTZNanosand the time-of-day extraction helpers.cast.sql(and the importednonansi/cast.sql) with regenerated golden files.Was this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor (Claude Opus 4.8)