feat(renderer): font ligature support via GSUB shaping + segmentation cache#129
Open
patrick-andrew-anchor wants to merge 4 commits into
Open
feat(renderer): font ligature support via GSUB shaping + segmentation cache#129patrick-andrew-anchor wants to merge 4 commits into
patrick-andrew-anchor wants to merge 4 commits into
Conversation
6783ff5 to
40709c3
Compare
Programming ligatures (=>, ->, !=, ===, <==>, ...) now render in the
dynamic atlas for fonts that ship ligature tables (Fira Code, JetBrains
Mono, Cascadia Code, Monaspace Neon).
The dynamic atlas rasterized one grapheme per cell, so the font shaper
never saw adjacent codepoints and ligatures never formed. This adds a
ligature shaper plus N-cell glyph support end to end:
- beamterm-core (new `ligatures` feature):
- shaper.rs: rustybuzz-based detection. Compares each shaped glyph to
the nominal cmap glyph, so it detects the `calt` "spacer" approach
(Fira Code et al. keep glyph count == char count) as well as classic
GSUB ligature merges.
- GlyphSlot::Ligature(id, cells) + cell_span(); size-classed ligature
pools (widths 3..=8) in the glyph cache with O(1) alloc + LRU
eviction. Two-cell ligatures continue to use the existing wide path.
- dynamic atlas: generic split_glyph_n + N consecutive slot uploads;
texture layers derived from the region layout so they can't drift.
- terminal grid: 2-cell placement generalized to N cells across all
update paths; segment_run + place_ligature helpers.
- beamterm-renderer:
- canvas rasterizer sizes each glyph to cell_w * unicode-width, so
ligature substrings render at their full width.
- BeamtermRenderer.setFontBytes(Uint8Array) builds the shaper from raw
sfnt bytes; Batch.text segments runs into ligature glyphs. Ligatures
activate automatically when the supplied font advertises them.
The shaper only detects/segments; the browser canvas still rasterizes,
preserving color emoji and font fallback. WOFF/WOFF2 must be decompressed
to sfnt before setFontBytes (documented in js/README.md).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
40709c3 to
60627d3
Compare
Shaper::segment() built a rustybuzz Face from the raw font bytes and ran the shaper for every text run on every call. The renderer re-shapes the whole screen each frame, so a static screen re-segmented every run ~60×/s — measured at ~38ms/frame of pure shaping on a full screen (render p50 48.8ms with ligatures on vs ~10.5ms off). Memoize segment() results in an LRU keyed on the run text. Segmentation depends only on the characters and the font, and a font change constructs a fresh Shaper (hence a fresh cache), so no explicit invalidation is needed. A static screen now pays the shaping cost once; repeated runs are an O(len) map lookup. `lru` was already a dependency. Adds a cache-correctness test asserting the memoized path returns segments identical to the uncached path on both miss and hit.
is_emoji() treated any pure-ASCII string with len > 1 and width >= 2 as an emoji to catch ASCII-led keycap sequences (e.g. "1️⃣"). That heuristic also matched 2-char programming ligatures like "->", "=>", "==", "<-", "&&". When ligature shaping landed, the 2-cell ligature substring is passed to GlyphCache::resolve_glyph_slot, so the false positive promoted these glyphs to GlyphSlot::Emoji(idx | DYNAMIC_EMOJI_FLAG). The set emoji bit (15) makes the fragment shader sample the glyph texture color directly instead of tinting with the cell foreground — rendering the white glyph mask untinted. The bug was invisible on dark themes (white ≈ light fg) but rendered the ligature white on light themes. 3+ cell ligatures use the separate Ligature slot pool, which never consults is_emoji, so they were unaffected. Require a non-ASCII continuation byte (U+FE0F / U+20E3), which real keycap sequences always carry and ASCII ligature runs never do. Adds regression tests covering keycaps (still emoji) and the ligature substrings (not emoji). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Programming ligatures (=>, ->, !=, ===, <==>, ...) now render in the dynamic atlas for fonts that ship ligature tables (Fira Code, JetBrains Mono, Cascadia Code, Monaspace Neon).
The dynamic atlas rasterized one grapheme per cell, so the font shaper never saw adjacent codepoints and ligatures never formed. This adds a ligature shaper plus N-cell glyph support end to end:
ligaturesfeature):calt"spacer" approach (Fira Code et al. keep glyph count == char count) as well as classic GSUB ligature merges.The shaper only detects/segments; the browser canvas still rasterizes, preserving color emoji and font fallback. WOFF/WOFF2 must be decompressed to sfnt before setFontBytes (documented in js/README.md).
Testing:
With this change and my client support, I can render ligatures:
shows as

Follow-up: memoize ligature run segmentation
This PR also includes a perf commit on top of the feature.
Shaper::segment()built a rustybuzzFacefrom the raw font bytes and ran the shaper for every text run on every call. The renderer re-shapes the whole screen each frame, so a static screen was re-segmented ~60×/s — measured at ~38ms/frame of pure shaping on a full screen (render p50 48.8ms with ligatures on vs ~10.5ms off).segment()is now memoized in an LRU keyed on the run text. Segmentation depends only on the characters and the font, and a font change constructs a freshShaper(hence a fresh cache), so no explicit invalidation is needed. A static screen pays the shaping cost once; repeated runs become anO(len)map lookup.lruwas already a dependency.Adds a cache-correctness test asserting the memoized path returns segments identical to the uncached path on both miss and hit (gated on
BEAMTERM_LIGATURE_TEST_FONT). Confined tobeamterm-core/src/gl/shaper.rs; no new dependencies.