perf: replace Val.Str._asciiSafe flag with AsciiSafeStr subclass#861
Draft
He-Pin wants to merge 7 commits into
Draft
perf: replace Val.Str._asciiSafe flag with AsciiSafeStr subclass#861He-Pin wants to merge 7 commits into
He-Pin wants to merge 7 commits into
Conversation
Motivation: The string-separator branch of std.join was building the result with an unsized java.lang.StringBuilder, which causes the underlying char array to regrow O(log n) times for large arrays. Re-evaluating each arr.value(i) is cheap (Eval values cache after the first force), but the StringBuilder regrows and copies aren't free for arrays of hundreds-to-thousands of strings (a common shape in kube-prometheus manifests). Independently, the resulting Val.Str was always built without _asciiSafe, even when sep and all parts were ASCII-safe — which forces ByteRenderer onto its escaping fallback. Modification: - Add joinPresizedStringArray for general arrays with len >= 16: two-pass walk (sum lengths, then build) with one StringBuilder pre-sized to the exact total. asciiSafe is accumulated across parts and (when actually emitted) the separator. - Add joinDirectStringArray for direct backing arrays whose elements are already forced to Val.Str / Val.Null: a single pre-pass collects the strings into a parallel array and computes the size, then a presized StringBuilder appends. Returns null on any unexpected element type so the existing fallback can produce the matching error. - Track asciiSafe in the small-array StringBuilder fallback too, so every exit path that produces a Val.Str gets the flag set when applicable. Total length is checked against Int.MaxValue to fail fast instead of overflowing. - Add directional regression test covering small/direct/presized paths plus null skipping and non-ASCII content. Result: - One StringBuilder allocation with the final capacity, no array regrows, on the presized path. - ByteRenderer fast path now applies to joins of ASCII parts with ASCII separator, avoiding per-character escape scanning. - Full JVM test suite green; Scala 3 format check green.
Motivation: After PR databricks#858 added the join-presized + asciiSafe optimization, format outputs (`%`-interpolation, `std.format`) still always created Val.Str with `_asciiSafe = false`. Downstream JSON rendering of format results falls back to the per-char escape scan + UTF-8 encode path even when both the format string and all interpolated values are pure ASCII. Manifest workloads heavy on `%(name)s` interpolation pay this cost on every emitted string. Modification: - Add `literalsAsciiSafe` to RuntimeFormat, computed once at parse time by scanning leading + inter-spec literal segments for printable ASCII with no `"` or `\`. - At format time, AND `literalsAsciiSafe` with each interpolated value's ASCII-safety: strings forward `_asciiSafe`; numerics are ASCII (except `%c` which depends on codepoint); booleans/null are ASCII; complex types (Arr/Obj routed through Renderer) are conservatively non-ASCII. - Refactor `Format.format` (both overloads) and `formatSimpleNamedString` to return `Val.Str` directly so the `_asciiSafe` flag is set at construction. Update the three external callers (Evaluator binop `%`, std.mod, std.format) and `PartialApplyFmt.evalRhs` accordingly. Result: Format outputs now correctly carry `_asciiSafe = true` when all inputs are ASCII-safe, letting ByteRenderer take the fast path during JSON manifestation. Regression test `new_test_suite/format_asciisafe_propagation.jsonnet` covers the simple `%(name)s` fast path, general `%s`/`%d`/`%c`/`%x`/`%o`/`%f` conversions, mixed ASCII/non-ASCII literals and values, and ByteRenderer roundtrip via `std.manifestJson`.
Motivation: Format.scanFormat repeatedly tests literal windows of the format string for ASCII-JSON-safety so each cached RuntimeFormat can pre-decide whether ByteRenderer's fast path applies. The previous scalar loop in Format.scala duplicated CharSWAR's bytewise check while bypassing the SWAR path that the full-string variant already enjoys. Modification: * Add `isAsciiJsonSafe(s, from, to)` range overloads on CharSWAR for JVM (SWAR over 4 chars at a time), Native (same SWAR), and JS (scalar). * Expose the overload via Platform.isAsciiJsonSafe on all three platforms. * Replace Format.isAsciiJsonSafeRange's body with a direct delegate to Platform.isAsciiJsonSafe so the format-string scanner reuses the SWAR path on JVM/Native. Result: Long format-string literals get the same 4-char-per-step scan that ByteRenderer already uses, without changing observable semantics. No allocations introduced and no API changes outside `Platform`.
Motivation: joinDirectStringArray allocated an Array[String] equal in length to the input array purely to skip Val.Null entries between two passes. The array survives until the join completes, so for arrays with thousands of elements the temporary buffer was a hot allocation on the std.join fast path. Modification: * Replace the parts[] cache with a single elemCount counter, tracking separator overhead as `sepLen * (elemCount - 1)` after pass 1 completes instead of incrementally toggling on `added`. * Pass 2 walks the original `direct` array, skipping Val.Null entries with isInstanceOf and casting non-null elements to Val.Str (already validated by pass 1) to avoid a redundant pattern dispatch. * Empty-result branch now returns an asciiSafe empty string so ByteRenderer keeps the fast path even when the join collapses to "". Result: Eliminates one Array[String] allocation per std.join over an array of strings; pass 1 still does the type-check and total-length accounting, pass 2 reuses the input array. No observable behavior change.
Motivation: joinedRepeatedString and its callers received raw `String` separators and elements, then re-scanned them with `Platform.isAsciiJsonSafe` on every join call. The Val.Str inputs already cached `_asciiSafe`, so the re-scan was redundant work on the std.join hot path for repeated-string arrays (e.g. `[x for ... in ...]` with identical x). Modification: * Switch joinedRepeatedString, joinRepeatedStringEval, and joinRepeatedDirectString to take `Val.Str` for both `sep` and the repeated element instead of `String`. The functions now read `_asciiSafe` directly from the inputs. * When count == 1 the function returns the input Val.Str unchanged, preserving its asciiSafe flag with no allocation. * Empty-result branches return an asciiSafe empty string so ByteRenderer keeps the fast path even when the repeat collapses. * Update Join.evalRhs callers to pass the Val.Str directly. Result: Removes two `Platform.isAsciiJsonSafe` scans per repeated-element join and avoids materializing a fresh Val.Str when count == 1. Behavior is identical: result strings still carry the same asciiSafe bit they would have under the old re-scan path.
Motivation: ByteRenderer's fast path skips JSON-escape scanning when the source Val.Str is marked asciiSafe. Several StringModule builtins were discarding the input flag and returning a fresh Val.Str that always re-scanned at render time, even when the transformation could not introduce unsafe bytes. Modification: * `std.char(n)` marks the result asciiSafe when the codepoint is in printable ASCII excluding `"` and `\\`; non-ASCII / control codepoints fall through to the regular constructor as before. * `std.asciiUpper`/`asciiLower` forward the input's asciiSafe flag — ASCII case folding cannot introduce unsafe bytes. * `std.strReplace(src, from, to)` results are asciiSafe when both `src` and `to` are; the removed `from` cannot affect output safety. * `std.stripChars`/`lstripChars`/`rstripChars` forward the input's flag — stripping codepoints cannot introduce unsafe bytes. * `std.split`/`splitLimit`/`splitLimitR` thread an `asciiSafe` parameter through `splitLimit`, `splitLimitR`, and `splitLimitRBounded` so each resulting element inherits the source string's flag. * Add `string_asciisafe_propagation.jsonnet` regression covering all five paths plus non-ASCII control inputs to verify that the ByteRenderer fast path stays correct after each transformation. Result: Strings produced by these stdlib calls keep ByteRenderer on the fast path when their inputs are asciiSafe. No observable Jsonnet semantics change; the only externally visible effect is fewer rescans during JSON manifestation.
Motivation: The boolean field added 1 byte to every Val.Str instance, which JVM alignment expanded to 8 bytes per object. Val.Str instances number in the millions on string-heavy workloads (e.g. joinedRepeatedString results, format outputs, parsed string literals), so the wasted padding adds up. Modification: Drop `_asciiSafe: Boolean` from Val.Str and introduce a sealed `Val.AsciiSafeStr extends Val.Str` marker subclass. Factory `Val.Str.asciiSafe(pos, s)` now constructs the subclass directly. ByteRenderer and propagation sites switch from `vs._asciiSafe` to `vs.isInstanceOf[Val.AsciiSafeStr]`. Str.concat preserves the subclass when both operands are ASCII-safe (eager and rope paths). Parser/Substr write sites that previously mutated the flag now call the asciiSafe factory directly. Result: 8 bytes saved per Val.Str instance with no behavioral change. JIT still devirtualizes `.str` access via CHA (single non-final implementation in the hierarchy). All JVM tests pass on Scala 3.3.7; all platforms (JVM/JS/Native/WASM) compile cleanly across Scala 3.3.7/2.13.18/2.12.21.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
The
_asciiSafe: Booleanfield added 1 byte to everyVal.Strinstance, which JVMalignment expanded to 8 bytes per object due to padding rules.
Val.Strinstancesnumber in the millions on string-heavy workloads (
joinedRepeatedStringresults,format outputs, parsed string literals), so the wasted padding adds up.
Modification
_asciiSafe: BooleanfromVal.Str.final class Val.AsciiSafeStr extends Val.Stras a marker subclass.Val.Str.asciiSafe(pos, s)now constructs the subclass directly.ByteRendererand propagation sites switch fromvs._asciiSafetovs.isInstanceOf[Val.AsciiSafeStr].Str.concatpreserves the subclass when both operands are ASCII-safe(eager and rope paths).
asciiSafefactory directly.JIT still devirtualizes
.straccess via CHA —Val.Strhas a single non-finalimplementation in the hierarchy (
AsciiSafeStr), so HotSpot can inline throughthe base class.
Result
8 bytes saved per
Val.Strinstance with no behavioral change. All JVM testspass on Scala 3.3.7; all platforms (JVM/JS/Native/WASM) compile cleanly across
Scala 3.3.7/2.13.18/2.12.21.
Stacked on
This branch stacks on prior unmerged work in the asciiSafe propagation chain:
std.joinstring buffer and propagate asciiSafejoinDirectStringArrayallocation drop,joinedRepeatedStringcache propagation,StringModule asciiSafe propagation)
The PR diff vs
masterincludes all 7 commits; review may prefer to wait untilfoundation PRs land.
Benchmark methodology
bench.runRegressionswith-wi 2 -w 2 -i 5 -r 3 -f 1(2 warmup × 2s,5 measure × 3s, 1 fork). Avgt mode, ms/op, ±99.9% CI bounds shown. Δ% marked
significant only when |Δ| exceeds combined ±CI half-widths.
--warmup 2 --runs 4betweenbaseline and HEAD assembly JARs (
java -Xss100m -jar … --max-stack 100000 …).Includes JVM startup; sub-1s benchmarks are dominated by startup variance and
should be read as a sanity check, not a primary signal.
bug_suite,cpp_suite,go_suite,jdk17_suite,sjsonnet_suite.JMH Results (avgt, ms/op — lower is better; ±99.9% CI)
Legend: ✅ = significant win (|Δ| > combined CI, speedup > 1.02×) |⚠️ = significant
regression | ≈ = within noise floor
JMH summary (46 files):
lazy_array_reverse_sparse, see notes below)Hyperfine Results (mean, seconds — lower is better; end-to-end CLI)
Legend: ✅ Δ < -2% |⚠️ Δ > +2% | ≈ within ±2%
Hyperfine summary (46 files):
Test plan
./mill 'sjsonnet.jvm[3.3.7]'.test— all green./mill 'sjsonnet.jvm[3.3.7]'.compileand__.compileacross all platforms(JVM Scala 3.3.7/2.13.18/2.12.21, JS, Native, WASM) — clean
./mill __.checkFormat— passingregression files (hyperfine A/B uses both as separate commands)