Skip to content

Comments

Optimized CommentedString.validString#1067

Open
ChrisBenua wants to merge 1 commit intotuist:mainfrom
ChrisBenua:optimisedCommentedString
Open

Optimized CommentedString.validString#1067
ChrisBenua wants to merge 1 commit intotuist:mainfrom
ChrisBenua:optimisedCommentedString

Conversation

@ChrisBenua
Copy link

@ChrisBenua ChrisBenua commented Feb 20, 2026

Resolves #1066

Short description 📝

Optimised CommendString.validString

Solution 📦

  • Replace StringProtocol.contains with checking that one UInt8 array (specifically UnsafeBufferPointer) is subarray of another. Why this approach is faster? Because StringProtocol.contains spends most of its time on string indexing (which requires much logic to preserve grapheme clusters).
  • Replace rangeOfCharacter with checking that UInt8 array (specifically UnsafeBufferPointer) contains specific value.
  • Instead of calling append for each string character in reduce function - calculate needed capacity upfront and create String using String(unsafeUninitializedCapacity:) init

Implementation 👩‍💻👨‍💻

Detail in a checklist the steps that you took to implement the PR.

  • Replace StringProtocol.contains and rangeOfCharacters with UnsafeBufferPointer<UInt8> operations
  • Get rid of reduce + append. It required too much memory allocation. Instead - calculate needed capacity upfront and create String using String(unsafeUninitializedCapacity:) init

Performance

Performance Comparison

Measured with M2 Max 32 GB RAM, MacOS 26.2, release build. If I run these benchmarks many times, average deviation is about 3-4%

Previous validString implementation performance:

simple alphanumeric (fast path)       0.47 µs/call (0.047 s total, 100000 iters)
underscore+slash (valid, special)     1.30 µs/call (0.130 s total, 100000 iters)
// and ___ (needs quoting)            0.77 µs/call (0.077 s total, 100000 iters)
needs escaping (\" \n \t \\)          0.74 µs/call (0.074 s total, 100000 iters)
spaces (invalid chars, no escape)     0.39 µs/call (0.039 s total, 100000 iters)
"true"                                0.01 µs/call (0.001 s total, 100000 iters)
"false"                               0.01 µs/call (0.001 s total, 100000 iters)
long clean (2200 chars)               119.46 µs/call (11.946 s total, 100000 iters)
long dirty (1500 chars, escaping)     36.50 µs/call (3.650 s total, 100000 iters)
mixed batch (10 strings)              161.31 µs/call (8.065 s total, 50000 iters)

Introduced validString implementation performance:

simple alphanumeric (fast path)       0.03 µs/call (0.003 s total, 100000 iters) - 15x boost
underscore+slash (valid, special)     0.06 µs/call (0.006 s total, 100000 iters) - 20x boost
// and ___ (needs quoting)            0.09 µs/call (0.009 s total, 100000 iters) - 8x boost
needs escaping (\" \n \t \\)          0.09 µs/call (0.009 s total, 100000 iters) - 8x boost
spaces (invalid chars, no escape)     0.07 µs/call (0.007 s total, 100000 iters) - 5x boost
"true"                                0.01 µs/call (0.001 s total, 100000 iters) - same
"false"                               0.00 µs/call (0.000 s total, 100000 iters) - same
long clean (2200 chars)               5.50 µs/call (0.550 s total, 100000 iters) - 20x boost
long dirty (1500 chars, escaping)     2.30 µs/call (0.230 s total, 100000 iters) - 16x boost
mixed batch (10 strings)              8.12 µs/call (0.406 s total, 50000 iters) - 20x boost

@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Feb 20, 2026
@dosubot
Copy link

dosubot bot commented Feb 20, 2026

Related Documentation

Checked 1 published document(s) in 1 knowledge base(s). No updates required.

How did I do? Any feedback?  Join Discord

@ChrisBenua
Copy link
Author

@dosubot could you review this PR? It optimized CommentedString.validString calculation by avoiding String indexing overhead by using UnsafeBufferPointer-s instead. Also avoid multiple String allocations by replacing reduce and string concatenation by calculating needed capacity upfront and create String using String(unsafeUninitializedCapacity:) init

@dosubot
Copy link

dosubot bot commented Feb 20, 2026

Thanks for working on this performance optimization! The 5-20x improvement benchmarks are impressive, especially the 20x boost for long strings (119µs → 5.5µs). Here's my review:

✅ What Looks Good

  • Optimization strategy is sound: Replacing StringProtocol.contains and rangeOfCharacter with UnsafeBufferPointer<UInt8> operations avoids expensive grapheme cluster handling
  • Memory-efficient string building: Using String(unsafeUninitializedCapacity:) with upfront capacity calculation eliminates repeated allocations from reduce + append
  • Performance impact is justified: Since validString is called during project encoding, this could save significant time on large projects

⚠️ Points to Verify

Memory Safety:

  • Ensure UnsafeBufferPointer operations stay within withContiguousStorageIfAvailable closures (no escaping pointers)
  • Handle the fallback case if withContiguousStorageIfAvailable returns nil
  • Verify exact byte count is returned from String(unsafeUninitializedCapacity:)

Correctness:

  • Pattern matching for // and ___ must handle boundary cases correctly at the byte level
  • Capacity calculation must account for worst-case escaping scenarios
  • All escape characters (\, ", \t, \n) are ASCII single-byte, so byte-level escaping is safe

Edge Cases to Test:

  • Current tests cover 51 scenarios but consider adding:
    • Long strings (1500-2200 chars matching your benchmarks)
    • Multi-byte UTF-8 characters (emoji, combining marks)
    • High escape density strings
    • Empty/single character edge cases

💡 Suggestion

The codebase has a good precedent for safe UnsafeBufferPointer usage in String+md5.swift - following those patterns (scoped wrappers, explicit cleanup) would be great.

Overall this looks like a valuable optimization! The complexity is justified by the significant performance gains. 👍

To reply, just mention @dosu.


How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other

Chat with Dosu Open in Cursor Join Discord Share on X

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve CommentedString.validString performance

1 participant