Skip to content

[BUGFIX] Fix escape handling in TextRoleRule#1308

Closed
CybotTM wants to merge 1 commit intophpDocumentor:mainfrom
netresearch:fix/textrole-escape
Closed

[BUGFIX] Fix escape handling in TextRoleRule#1308
CybotTM wants to merge 1 commit intophpDocumentor:mainfrom
netresearch:fix/textrole-escape

Conversation

@CybotTM
Copy link
Contributor

@CybotTM CybotTM commented Feb 24, 2026

Summary

Fixes escape handling bugs in TextRoleRule for code-type text roles (:rst:, :php:, :code:, etc.).

Resolves: TYPO3-Documentation/render-guides#1188

Bug 1: Escaped backslash stays literal in rawPart

Code-type text roles use $rawContent (via #533). When the author writes \\ to display a single backslash, $rawPart preserved the literal \\ instead of resolving it.

RST input:

:code:`a\\b`

Before (broken):

<code>a\\b</code>

After (fixed):

<code>a\b</code>

Other escapes like \T or \* are intentionally preserved raw in $rawPart — code contexts need literal backslash-letter sequences (e.g., PHP namespaces like \App\Entity).

Bug 2: Escaped backtick swallows closing delimiter

The lexer has a catchable pattern that tokenizes a 3-char sequence (backslash + two backticks) as a single ESCAPED_SIGN token. This swallows the closing backtick of the text role, so TextRoleRule never finds a BACKTICK token to close — it rolls back and the role breaks entirely.

RST input:

:code:`text\``

Before (broken): role is not recognized, rendered as literal text:

:code:`text\``

After (fixed): post-loop recovery detects the swallowed backtick. The 3-char token is split semantically:

\``  =  \`  (escaped backtick → literal `)  +  `  (closing delimiter)

The escape character is consumed, the backtick becomes content:

<code>text`</code>

The recovery is narrowed to only the 3-char token — the only lexer pattern that swallows backticks. Regular 2-char escapes at end-of-input (like \T or \\ without a closing backtick) correctly roll back as genuinely unterminated roles:

RST input (genuinely unterminated — no closing backtick):

:role:`content\T

Result: rolls back, no role node produced (correct — the role was never closed).

Changes

TextRoleRule.php:

  • Resolve \\\ in $rawPart (other escapes preserved raw)
  • Add EOF fallback for the 3-char token: escape consumed, literal backtick in content
  • Extract createTextRoleNode() helper to deduplicate role-building logic
  • Simplify escape tracking from two variables to one ($lastEscapedToken)

TextRoleRuleTest.php:

  • 3 new positive cases: escaped backslash, escaped backtick at end, only escaped backtick
  • 2 new negative cases: unterminated roles with \T and \\ at end → assert rollback

Functional tests:

  • Updated code-textrole-no-escape.html expected output
  • New code-textrole-escape/ test fixture

Test plan

  • All 8 unit tests pass (6 positive + 2 negative rollback cases)
  • All 3 textrole functional tests pass
  • No regressions in existing text role behavior

@CybotTM CybotTM closed this Feb 24, 2026
@CybotTM CybotTM reopened this Feb 24, 2026
@CybotTM CybotTM changed the title WIP: [BUGFIX] Fix escape handling in TextRoleRule [BUGFIX] Fix escape handling in TextRoleRule Feb 24, 2026
Fix two bugs causing code-type text roles (:rst:, :php:, :code:,
etc.) to mishandle escape sequences:

1. $rawPart preserved \\ instead of resolving to \. Now resolved
   for escaped-backslash; other escapes (\T, \*) stay raw for code
   contexts (e.g. PHP namespaces).

2. The lexer's \`` catchable pattern (3 chars) swallows the closing
   backtick as part of an ESCAPED_SIGN token. The role never closes
   and rolls back. Now detected post-loop: the escape is consumed
   and a literal backtick becomes content. Narrowed to only the
   3-char token; regular 2-char escapes at EOF correctly roll back.

Resolves: TYPO3-Documentation/render-guides#1188
@CybotTM CybotTM force-pushed the fix/textrole-escape branch from 6be902c to de35afa Compare February 24, 2026 13:53
@CybotTM CybotTM marked this pull request as ready for review February 24, 2026 13:56
@CybotTM
Copy link
Contributor Author

CybotTM commented Feb 24, 2026

Marking this as ready for review, but I'd appreciate feedback on the approach — I'm not fully sure how this project wants to handle these edge cases.

Specific questions:

  1. rawPart resolution for \\: This PR resolves \\\ in $rawPart, so that:

    :code:`a\\b`

    renders as a\b. Other escapes like \T are preserved raw for code contexts (PHP namespaces etc.). Is this the right trade-off, or should $rawPart stay fully unresolved?

  2. EOF fallback for the 3-char token: The lexer's catchable pattern for backslash + two backticks swallows the closing backtick. The fix detects this post-loop and recovers, but only for that specific 3-char token. Is this recovery acceptable, or would a lexer-level fix be preferred?

  3. Consistency of mid-content vs end-of-content escapes: Mid-content escaped backticks are still preserved raw in $rawPart (existing behavior, not changed here). The EOF fallback resolves them. Should these be consistent?

Happy to adjust the approach based on your guidance.

@jaapio
Copy link
Member

jaapio commented Feb 26, 2026

Thank you for your work on this bugfix. Your contribution helped highlight some interesting edge cases and scenarios that needed verification, which was very valuable.

After further investigation, I discovered that the issue was more complex and required a broader fix. I’ve since implemented and merged a comprehensive solution that addresses the root cause. You might want to verify the changes in #1312.

I’ll go ahead and close this pull request, but I truly appreciate your effort and the insights your work provided. Looking forward to your future contributions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] :rst: text role cannot display a single backslash character

2 participants