From 3fa13fa5c2d98435a6a2bd25cf1aa6b7a5d08f8b Mon Sep 17 00:00:00 2001 From: Hyukjin Kwon Date: Wed, 24 Jun 2026 09:08:05 +0900 Subject: [PATCH] [SPARK-57653][CORE] Make UTF8String.getByte honor its documented out-of-bounds contract ### What changes were proposed in this pull request? `UTF8String.getByte(int)` is documented as "If byte index is invalid, returns 0", but the implementation performed an unchecked `Platform.getByte(base, offset + byteIndex)`, returning adjacent/uninitialized memory for out-of-range indices. This adds the bounds check so the method matches its contract. ### Why are the changes needed? Under JDK 25 the out-of-bounds read returned non-zero adjacent memory, failing `UTF8StringSuite.testGetByte` with 'expected 0 but got 47' in the Maven (Scala 2.13, JDK 25) scheduled build. The previous '0' was never guaranteed; the fix makes behavior deterministic across JDKs. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Existing `UTF8StringSuite` (incl. `testGetByte`), verified green under JDK 25. Co-authored-by: Isaac --- .../main/java/org/apache/spark/unsafe/types/UTF8String.java | 3 +++ 1 file changed, 3 insertions(+) diff --git a/common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java b/common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java index c9256b0a8f33c..f8d7041933907 100644 --- a/common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java +++ b/common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java @@ -703,6 +703,9 @@ public boolean contains(final UTF8String substring) { * Returns the byte at (byte) position `byteIndex`. If byte index is invalid, returns 0. */ public byte getByte(int byteIndex) { + if (byteIndex < 0 || byteIndex >= numBytes) { + return 0; + } return Platform.getByte(base, offset + byteIndex); }