[SPARK-56769][SQL] Add fast path for date_trunc WEEK/MONTH/QUARTER/YEAR#55736
Open
Licht-T wants to merge 4 commits intoapache:masterfrom
Open
[SPARK-56769][SQL] Add fast path for date_trunc WEEK/MONTH/QUARTER/YEAR#55736Licht-T wants to merge 4 commits intoapache:masterfrom
Licht-T wants to merge 4 commits intoapache:masterfrom
Conversation
…meBenchmark (JDK 21, Scala 2.13, split 1 of 1)
…meBenchmark (JDK 25, Scala 2.13, split 1 of 1)
…meBenchmark (JDK 17, Scala 2.13, split 1 of 1)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This PR extends the offset-arithmetic + DST-equality-guard fast path introduced in SPARK-56663 from MIN/HR/DAY to the date-level units WEEK / MONTH / QUARTER / YEAR.
The framework for offset-based truncation -- resolve offset once, apply, truncate in the local frame, re-apply, DST guard, fall back on DST-cross or arithmetic overflow -- is identical for every level above SECOND. Only the "truncate in local frame" step varies. This PR inlines SPARK-56663's
truncToUnitFasttogether with the new date-level path directly intotruncTimestamp, and keeps a single privatetruncTimestampSlowas a complete reference implementation that the fast path falls back to:The local-frame truncation step is the only thing the fast path branches on:
MICROSECOND/MILLISECOND/SECOND- pure UTCfloorMod(zone offsets have at most second precision perjava.time.ZoneOffset; no zone information needed).MINUTE/HOUR/DAY- shifted-localfloorModagainst the unit micros.WEEK/MONTH/QUARTER/YEAR- compute local epoch-day by integer division, runtruncDatein the local-day frame, multiply back to local micros.Everything else (offset resolve via
rules.getOffset,addExact/subtractExact, DST guard via offset-equality at the candidate, slow-path fallback) is shared.The DST guard fires correctly for the new date-level cases - for example, YEAR truncation of a March instant in
America/Los_Angelesproduces a candidate at Jan 1 (which is in PST, offset -8) while the original is in PDT (offset -7); the offsets differ, so the path falls back to the slowmicrosToDays/daysToMicrosroute which usesZonedDateTime.resolveLocalto land on Jan 1 00:00 PST.This PR also rewrites
TRUNC_TO_QUARTERfromIsoFields.DAY_OF_QUARTER(aTemporalAdjusterthat produces a freshLocalDate) to a directwithMonth(firstMonthOfQuarter).withDayOfMonth(1)chain on the existingLocalDate. Saves one allocation + the adjuster overhead per call.truncTimestampSlowcovers every level explicitly so it serves as a self-contained reference implementation - the fast path's correctness can be verified against it case-by-case.Why are the changes needed?
SPARK-33404 (Nov 2020) routed every
date_trunclevel above SECOND throughmicrosToInstant().atZone(zoneId).truncatedTo(unit)for correctness, costing ~5.5× throughput per the follow-up benchmark PR (#30338). SPARK-56663 recovered most of that for MIN/HR/DAY using the offset-arithmetic + DST-guard pattern. This PR extends the same recovery to WEEK / MONTH / QUARTER / YEAR - the levels that drive monthly/quarterly aggregations in analytics workloads.DateTimeBenchmarkTruncation results, wholestage on, ns/row on a 12th Gen Intel i7-1260P (master = pre-SPARK-56663):Time-level units (MIN/HR/DAY/SECOND) and
trunc(date, ...)are unchanged within noise; the hot path for those levels is byte-identical to SPARK-56663 after the unification.Does this PR introduce any user-facing change?
No. The output of
date_truncis identical to master in all cases, including DST-spanning truncations (verified by the offset-equality guard + slow-path fallback, plus the new tests). Only the internal implementation changes.How was this patch tested?
DateTimeUtilsSuite- all 66 tests pass, including:SPARK-33404: test truncTimestamp when time zone offset from UTC has a granularity of seconds, extended to also exercise WEEK / MONTH / QUARTER / YEAR with the 1769-10-17 LMT timestamp across every available zone (the existing loop already covered SECOND/MILLI/MICRO; SPARK-56663 added HOUR/DAY; this PR completes the matrix).truncTimestamptest, which loops WEEK / MONTH / QUARTER / YEAR for 2015 timestamps across every zone.truncTimestamp date-level units across DST boundaries- covers YEAR / QUARTER truncation that crosses the LA spring-forward (DST guard fires, fallback path runs) and MONTH truncation entirely within DST (fast path stays).DateExpressionsSuite- all tests pass (no changes to expression-level code, only the underlyingDateTimeUtilshelpers).DateTimeBenchmarkre-run via the GitHub ActionsRun benchmarksworkflow on this fork for JDK 17, 21, and 25; results committed back to the branch.Was this patch authored or co-authored using generative AI tooling?
Yes, co-authored with Claude Code.