Skip to content

[SPARK-57295][SQL] Extend whitespace-only path validation consistency to direct file-path queries#56732

Open
AnuragKDwivedi wants to merge 1 commit into
apache:masterfrom
AnuragKDwivedi:SPARK-57295-db-location-validation-direct-file-path
Open

[SPARK-57295][SQL] Extend whitespace-only path validation consistency to direct file-path queries#56732
AnuragKDwivedi wants to merge 1 commit into
apache:masterfrom
AnuragKDwivedi:SPARK-57295-db-location-validation-direct-file-path

Conversation

@AnuragKDwivedi

Copy link
Copy Markdown
Contributor

Description

This is a follow-up to PR #56356, which improved validation consistency for namespace locations by treating whitespace-only values as invalid locations.

What changes were proposed in this pull request?

This PR extends the same validation behavior to direct file-path queries.

Currently, direct file-path validation checks only for empty strings using isEmpty. Consequently, whitespace-only paths such as " ", "\t", and "\n" are not recognized as empty during analysis and may fail later with datasource-specific errors.

This PR updates the validation to use SparkStringUtils.isBlank(...), ensuring that whitespace-only paths are treated as invalid and consistently fail with the standard INVALID_EMPTY_LOCATION error.

By doing so, the change aligns direct file-path validation with the existing namespace location validation logic and improves consistency across Spark SQL location handling.

Why are the changes needed?

Currently, validation behavior differs depending on the type of location being processed:

  • Empty paths ("") are rejected during analysis with INVALID_EMPTY_LOCATION.
  • Whitespace-only paths (" ", "\t", "\n") may bypass analysis-time validation and fail later with datasource-specific errors.

Using SparkStringUtils.isBlank(...) ensures consistent handling of all blank path values across Spark SQL.

Does this PR introduce any user-facing change?

Yes.

Whitespace-only direct file paths are now rejected during analysis with INVALID_EMPTY_LOCATION, providing behavior consistent with namespace location validation.

How was this patch tested?

Added regression test coverage for blank path values, including:

  • ""
  • " "
  • "\t"
  • "\n"

and verified that they consistently fail with INVALID_EMPTY_LOCATION.

Jira - https://issues.apache.org/jira/browse/SPARK-57295

@AnuragKDwivedi

Copy link
Copy Markdown
Contributor Author

@cloud-fan Could you please review this PR when you get a chance?

This is a follow-up to #56356 that implements the whitespace-only path validation consistency improvement you suggested earlier for direct file-path queries by using SparkStringUtils.isBlank(...).

Thanks!

val e = intercept[AnalysisException] {
sql(s"select id from json.`$location`")
}
assert(e.message.contains("The location name cannot be empty string"))

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new test uses assert(e.message.contains(...)) instead of the structured checkError(condition = "INVALID_EMPTY_LOCATION", parameters = Map("location" -> location)) form used by the sibling tests.

contains is brittle and can false-pass; switching to checkError would match sibling convention and validate the exact condition code + params. Please update accordingly.

@uros-b uros-b left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left one comment, but otherwise looks good. Thank you @AnuragKDwivedi! cc @cloud-fan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants