[SPARK-28587][SQL] Route JDBC partition bound literals through JdbcDialect.compileValue#55727
Open
AliaksandrAleksiukCollibra wants to merge 3 commits intoapache:masterfrom
Conversation
…alect.compileValue Spark generates partition WHERE clauses like `col < '2024-01-01'` for date/timestamp columns, which strict-typing engines (Athena, Phoenix) reject with a type mismatch since bare quoted strings are VARCHAR, not DATE/TIMESTAMP. Fix passes the dialect to `toBoundValueInWhereClause` and calls `dialect.compileValue` for date/timestamp bounds. The base implementation returns the same bare-quoted string as before, so existing dialects are unaffected.
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
toBoundValueInWhereClauseinJDBCRelationnow accepts the resolvedJdbcDialectand callsdialect.compileValuefor date and timestamp partition bounds instead of hardcoding bare quoted string literals.Why are the changes needed?
When partitioning a JDBC table by a date or timestamp column, Spark generates WHERE clauses like
col < '2024-01-01'. Strict-typing engines such as Athena and Phoenix reject this with a type mismatch error (Cannot apply operator: date < varchar) because the bare quoted string is treated asVARCHAR, notDATE/TIMESTAMP.JdbcDialect.compileValuealready exists for dialect-specific value formatting and is used in filter pushdown, but was never wired into partition bound generation. Dialects for strict-typing engines can now override compileValue to emit typed literals such asDATE '2024-01-01'orTIMESTAMP '2024-01-01 00:00:00'.Does this PR introduce any user-facing change?
Yes. Users with a custom JdbcDialect that overrides compileValue will now see that override applied to partition WHERE clauses as well as filter pushdown. The base
JdbcDialect.compileValuereturns the same bare-quoted string as before, so all built-in dialects and users without a custom dialect are unaffected.How was this patch tested?
Added
SPARK-28587: columnPartition should use dialect.compileValue for date/timestamp boundsinJDBCSuite. The test registers a custom dialect that emitsDATE '...'/TIMESTAMP '...'typed literals, callsJDBCRelation.columnPartitiondirectly, and asserts the generated WHERE clauses contain typed literals rather than bare quoted strings.Existing tests SPARK-34843 and SPARK-22814 continue to pass, confirming backward compatibility.
Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Sonnet 4.6