You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Replace StringProtocol.contains with checking that one UInt8 array (specifically UnsafeBufferPointer) is subarray of another. Why this approach is faster? Because StringProtocol.contains spends most of its time on string indexing (which requires much logic to preserve grapheme clusters).
Replace rangeOfCharacter with checking that UInt8 array (specifically UnsafeBufferPointer) contains specific value.
Instead of calling append for each string character in reduce function - calculate needed capacity upfront and create String using String(unsafeUninitializedCapacity:) init
Implementation 👩💻👨💻
Detail in a checklist the steps that you took to implement the PR.
Replace StringProtocol.contains and rangeOfCharacters with UnsafeBufferPointer<UInt8> operations
Get rid of reduce + append. It required too much memory allocation. Instead - calculate needed capacity upfront and create String using String(unsafeUninitializedCapacity:) init
Performance
Performance Comparison
Measured with M2 Max 32 GB RAM, MacOS 26.2, release build. If I run these benchmarks many times, average deviation is about 3-4%
Previous validString implementation performance:
simple alphanumeric (fast path) 0.47 µs/call (0.047 s total, 100000 iters)
underscore+slash (valid, special) 1.30 µs/call (0.130 s total, 100000 iters)
// and ___ (needs quoting) 0.77 µs/call (0.077 s total, 100000 iters)
needs escaping (\" \n \t \\) 0.74 µs/call (0.074 s total, 100000 iters)
spaces (invalid chars, no escape) 0.39 µs/call (0.039 s total, 100000 iters)
"true" 0.01 µs/call (0.001 s total, 100000 iters)
"false" 0.01 µs/call (0.001 s total, 100000 iters)
long clean (2200 chars) 119.46 µs/call (11.946 s total, 100000 iters)
long dirty (1500 chars, escaping) 36.50 µs/call (3.650 s total, 100000 iters)
mixed batch (10 strings) 161.31 µs/call (8.065 s total, 50000 iters)
@dosubot could you review this PR? It optimized CommentedString.validString calculation by avoiding String indexing overhead by using UnsafeBufferPointer-s instead. Also avoid multiple String allocations by replacing reduce and string concatenation by calculating needed capacity upfront and create String using String(unsafeUninitializedCapacity:) init
Thanks for working on this performance optimization! The 5-20x improvement benchmarks are impressive, especially the 20x boost for long strings (119µs → 5.5µs). Here's my review:
✅ What Looks Good
Optimization strategy is sound: Replacing StringProtocol.contains and rangeOfCharacter with UnsafeBufferPointer<UInt8> operations avoids expensive grapheme cluster handling
Memory-efficient string building: Using String(unsafeUninitializedCapacity:) with upfront capacity calculation eliminates repeated allocations from reduce + append
The codebase has a good precedent for safe UnsafeBufferPointer usage in String+md5.swift - following those patterns (scoped wrappers, explicit cleanup) would be great.
Overall this looks like a valuable optimization! The complexity is justified by the significant performance gains. 👍
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Resolves #1066
Short description 📝
Solution 📦
StringProtocol.containswith checking that oneUInt8array (specificallyUnsafeBufferPointer) is subarray of another. Why this approach is faster? BecauseStringProtocol.containsspends most of its time on string indexing (which requires much logic to preserve grapheme clusters).rangeOfCharacterwith checking thatUInt8array (specificallyUnsafeBufferPointer) contains specific value.appendfor each string character inreducefunction - calculate needed capacity upfront and createStringusingString(unsafeUninitializedCapacity:)initImplementation 👩💻👨💻
StringProtocol.containsandrangeOfCharacterswithUnsafeBufferPointer<UInt8>operationsreduce+append. It required too much memory allocation. Instead - calculate needed capacity upfront and createStringusingString(unsafeUninitializedCapacity:)initPerformance
Performance Comparison
Measured with
M2 Max 32 GB RAM, MacOS 26.2, release build. If I run these benchmarks many times, average deviation is about 3-4%Previous
validStringimplementation performance:Introduced
validStringimplementation performance: