[SPARK-57295][SQL] Extend whitespace-only path validation consistency to direct file-path queries#56732
Conversation
…irect SQL queries
|
@cloud-fan Could you please review this PR when you get a chance? This is a follow-up to #56356 that implements the whitespace-only path validation consistency improvement you suggested earlier for direct file-path queries by using Thanks! |
| val e = intercept[AnalysisException] { | ||
| sql(s"select id from json.`$location`") | ||
| } | ||
| assert(e.message.contains("The location name cannot be empty string")) |
There was a problem hiding this comment.
The new test uses assert(e.message.contains(...)) instead of the structured checkError(condition = "INVALID_EMPTY_LOCATION", parameters = Map("location" -> location)) form used by the sibling tests.
contains is brittle and can false-pass; switching to checkError would match sibling convention and validate the exact condition code + params. Please update accordingly.
uros-b
left a comment
There was a problem hiding this comment.
Left one comment, but otherwise looks good. Thank you @AnuragKDwivedi! cc @cloud-fan
Description
This is a follow-up to PR #56356, which improved validation consistency for namespace locations by treating whitespace-only values as invalid locations.
What changes were proposed in this pull request?
This PR extends the same validation behavior to direct file-path queries.
Currently, direct file-path validation checks only for empty strings using
isEmpty. Consequently, whitespace-only paths such as" ","\t", and"\n"are not recognized as empty during analysis and may fail later with datasource-specific errors.This PR updates the validation to use
SparkStringUtils.isBlank(...), ensuring that whitespace-only paths are treated as invalid and consistently fail with the standardINVALID_EMPTY_LOCATIONerror.By doing so, the change aligns direct file-path validation with the existing namespace location validation logic and improves consistency across Spark SQL location handling.
Why are the changes needed?
Currently, validation behavior differs depending on the type of location being processed:
"") are rejected during analysis withINVALID_EMPTY_LOCATION." ","\t","\n") may bypass analysis-time validation and fail later with datasource-specific errors.Using
SparkStringUtils.isBlank(...)ensures consistent handling of all blank path values across Spark SQL.Does this PR introduce any user-facing change?
Yes.
Whitespace-only direct file paths are now rejected during analysis with
INVALID_EMPTY_LOCATION, providing behavior consistent with namespace location validation.How was this patch tested?
Added regression test coverage for blank path values, including:
""" ""\t""\n"and verified that they consistently fail with
INVALID_EMPTY_LOCATION.Jira - https://issues.apache.org/jira/browse/SPARK-57295