Skip to content

feat(renderer): font ligature support via GSUB shaping + segmentation cache#129

Open
patrick-andrew-anchor wants to merge 4 commits into
junkdog:mainfrom
patrick-andrew-anchor:feat/font-ligatures
Open

feat(renderer): font ligature support via GSUB shaping + segmentation cache#129
patrick-andrew-anchor wants to merge 4 commits into
junkdog:mainfrom
patrick-andrew-anchor:feat/font-ligatures

Conversation

@patrick-andrew-anchor

@patrick-andrew-anchor patrick-andrew-anchor commented Jun 15, 2026

Copy link
Copy Markdown

Programming ligatures (=>, ->, !=, ===, <==>, ...) now render in the dynamic atlas for fonts that ship ligature tables (Fira Code, JetBrains Mono, Cascadia Code, Monaspace Neon).

The dynamic atlas rasterized one grapheme per cell, so the font shaper never saw adjacent codepoints and ligatures never formed. This adds a ligature shaper plus N-cell glyph support end to end:

  • beamterm-core (new ligatures feature):
    • shaper.rs: rustybuzz-based detection. Compares each shaped glyph to the nominal cmap glyph, so it detects the calt "spacer" approach (Fira Code et al. keep glyph count == char count) as well as classic GSUB ligature merges.
    • GlyphSlot::Ligature(id, cells) + cell_span(); size-classed ligature pools (widths 3..=8) in the glyph cache with O(1) alloc + LRU eviction. Two-cell ligatures continue to use the existing wide path.
    • dynamic atlas: generic split_glyph_n + N consecutive slot uploads; texture layers derived from the region layout so they can't drift.
    • terminal grid: 2-cell placement generalized to N cells across all update paths; segment_run + place_ligature helpers.
  • beamterm-renderer:
    • canvas rasterizer sizes each glyph to cell_w * unicode-width, so ligature substrings render at their full width.
    • BeamtermRenderer.setFontBytes(Uint8Array) builds the shaper from raw sfnt bytes; Batch.text segments runs into ligature glyphs. Ligatures activate automatically when the supplied font advertises them.

The shaper only detects/segments; the browser canvas still rasterizes, preserving color emoji and font fallback. WOFF/WOFF2 must be decompressed to sfnt before setFontBytes (documented in js/README.md).

Testing:
With this change and my client support, I can render ligatures:

user@pandrew-dev:~$ printf '%s\n' '-> => != == === !== <==> --> |> :: // /* */ >>= <=>'
  -> => != == === !== <==> --> |> :: // /* */ >>= <=>"

shows as
Screenshot 2026-06-15 11 23 11 AM


Follow-up: memoize ligature run segmentation

This PR also includes a perf commit on top of the feature.

Shaper::segment() built a rustybuzz Face from the raw font bytes and ran the shaper for every text run on every call. The renderer re-shapes the whole screen each frame, so a static screen was re-segmented ~60×/s — measured at ~38ms/frame of pure shaping on a full screen (render p50 48.8ms with ligatures on vs ~10.5ms off).

segment() is now memoized in an LRU keyed on the run text. Segmentation depends only on the characters and the font, and a font change constructs a fresh Shaper (hence a fresh cache), so no explicit invalidation is needed. A static screen pays the shaping cost once; repeated runs become an O(len) map lookup. lru was already a dependency.

Adds a cache-correctness test asserting the memoized path returns segments identical to the uncached path on both miss and hit (gated on BEAMTERM_LIGATURE_TEST_FONT). Confined to beamterm-core/src/gl/shaper.rs; no new dependencies.

@patrick-andrew-anchor patrick-andrew-anchor marked this pull request as ready for review June 15, 2026 18:40
@patrick-andrew-anchor patrick-andrew-anchor marked this pull request as draft June 15, 2026 18:40
@patrick-andrew-anchor patrick-andrew-anchor force-pushed the feat/font-ligatures branch 3 times, most recently from 6783ff5 to 40709c3 Compare June 15, 2026 19:06
Programming ligatures (=>, ->, !=, ===, <==>, ...) now render in the
dynamic atlas for fonts that ship ligature tables (Fira Code, JetBrains
Mono, Cascadia Code, Monaspace Neon).

The dynamic atlas rasterized one grapheme per cell, so the font shaper
never saw adjacent codepoints and ligatures never formed. This adds a
ligature shaper plus N-cell glyph support end to end:

- beamterm-core (new `ligatures` feature):
  - shaper.rs: rustybuzz-based detection. Compares each shaped glyph to
    the nominal cmap glyph, so it detects the `calt` "spacer" approach
    (Fira Code et al. keep glyph count == char count) as well as classic
    GSUB ligature merges.
  - GlyphSlot::Ligature(id, cells) + cell_span(); size-classed ligature
    pools (widths 3..=8) in the glyph cache with O(1) alloc + LRU
    eviction. Two-cell ligatures continue to use the existing wide path.
  - dynamic atlas: generic split_glyph_n + N consecutive slot uploads;
    texture layers derived from the region layout so they can't drift.
  - terminal grid: 2-cell placement generalized to N cells across all
    update paths; segment_run + place_ligature helpers.
- beamterm-renderer:
  - canvas rasterizer sizes each glyph to cell_w * unicode-width, so
    ligature substrings render at their full width.
  - BeamtermRenderer.setFontBytes(Uint8Array) builds the shaper from raw
    sfnt bytes; Batch.text segments runs into ligature glyphs. Ligatures
    activate automatically when the supplied font advertises them.

The shaper only detects/segments; the browser canvas still rasterizes,
preserving color emoji and font fallback. WOFF/WOFF2 must be decompressed
to sfnt before setFontBytes (documented in js/README.md).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@patrick-andrew-anchor patrick-andrew-anchor marked this pull request as ready for review June 15, 2026 20:16
Shaper::segment() built a rustybuzz Face from the raw font bytes and ran
the shaper for every text run on every call. The renderer re-shapes the
whole screen each frame, so a static screen re-segmented every run ~60×/s
— measured at ~38ms/frame of pure shaping on a full screen (render p50
48.8ms with ligatures on vs ~10.5ms off).

Memoize segment() results in an LRU keyed on the run text. Segmentation
depends only on the characters and the font, and a font change constructs
a fresh Shaper (hence a fresh cache), so no explicit invalidation is
needed. A static screen now pays the shaping cost once; repeated runs are
an O(len) map lookup. `lru` was already a dependency.

Adds a cache-correctness test asserting the memoized path returns segments
identical to the uncached path on both miss and hit.
@patrick-andrew-anchor patrick-andrew-anchor changed the title feat(renderer): font ligature support via GSUB shaping (#128) feat(renderer): font ligature support via GSUB shaping + segmentation cache Jun 17, 2026
is_emoji() treated any pure-ASCII string with len > 1 and width >= 2 as an
emoji to catch ASCII-led keycap sequences (e.g. "1️⃣"). That heuristic also
matched 2-char programming ligatures like "->", "=>", "==", "<-", "&&".

When ligature shaping landed, the 2-cell ligature substring is passed to
GlyphCache::resolve_glyph_slot, so the false positive promoted these glyphs
to GlyphSlot::Emoji(idx | DYNAMIC_EMOJI_FLAG). The set emoji bit (15) makes
the fragment shader sample the glyph texture color directly instead of
tinting with the cell foreground — rendering the white glyph mask untinted.
The bug was invisible on dark themes (white ≈ light fg) but rendered the
ligature white on light themes. 3+ cell ligatures use the separate Ligature
slot pool, which never consults is_emoji, so they were unaffected.

Require a non-ASCII continuation byte (U+FE0F / U+20E3), which real keycap
sequences always carry and ASCII ligature runs never do. Adds regression
tests covering keycaps (still emoji) and the ligature substrings (not emoji).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant