Broadcast hash join and sort-merge join can produce incorrect results on non-default collated keys [Spark 4]

## Describe the bug

Follow-up from [#4035](https://github.com/apache/datafusion-comet/pull/4035), which rejects non-default collated strings in shuffle, sort, and aggregate but does not cover the join paths.

`CometHashJoin.doConvert` (used by both `CometHashJoinExec` and `CometBroadcastHashJoinExec`) and `CometSortMergeJoinExec.supportedSortMergeJoinEqualType` both accept any `StringType` as a join key without checking for non-default collation. Comet's native join performs byte-level key equality, so rows that compare equal under a collation (e.g., `'a'` vs `'A'` under `utf8_lcase`) will not match.

For sort-merge join this is masked today by the shuffle-side fallback added in #4035: the shuffle on a collated key falls back to Spark, which forces the whole stage to Spark. Broadcast hash join has no shuffle to protect it, so incorrect results can escape.

## Steps to reproduce

With Spark 4.0.1:

```sql
SET spark.comet.exec.broadcastHashJoin.enabled = true;

CREATE OR REPLACE TEMP VIEW l AS SELECT * FROM (VALUES ('a'), ('B')) AS t(c);
CREATE OR REPLACE TEMP VIEW r AS SELECT * FROM (VALUES ('A'), ('b')) AS t(c);

SELECT l.c, r.c
FROM l JOIN r
ON l.c COLLATE utf8_lcase = r.c COLLATE utf8_lcase;
```

Spark returns two matching rows; Comet (with broadcast hash join enabled) returns zero because the byte-level equality in the native join does not match `'a'` with `'A'`.

## Expected behavior

Comet should fall back to Spark for hash joins and sort-merge joins whose equality keys have a non-default collation, either by rejecting the key types in `supportedSortMergeJoinEqualType` / a new equivalent check in `CometHashJoin.doConvert`, or by funneling through the existing `isStringCollationType` predicate on `CometTypeShim`.

## Additional context

- Related PR: #4035
- Related issue: #1947
- Relevant code:
  - `spark/src/main/scala/org/apache/spark/sql/comet/operators.scala` `CometHashJoin.doConvert` (around line 1697)
  - `spark/src/main/scala/org/apache/spark/sql/comet/operators.scala` `supportedSortMergeJoinEqualType` (around line 2138)
  - `common/src/main/spark-4.0/org/apache/comet/shims/CometTypeShim.scala` `isStringCollationType`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Broadcast hash join and sort-merge join can produce incorrect results on non-default collated keys [Spark 4] #4051

Describe the bug

Steps to reproduce

Expected behavior

Additional context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Broadcast hash join and sort-merge join can produce incorrect results on non-default collated keys [Spark 4] #4051

Description

Describe the bug

Steps to reproduce

Expected behavior

Additional context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions