[BUGFIX] Fix escape handling in TextRoleRule#1308
[BUGFIX] Fix escape handling in TextRoleRule#1308CybotTM wants to merge 1 commit intophpDocumentor:mainfrom
Conversation
Fix two bugs causing code-type text roles (:rst:, :php:, :code:, etc.) to mishandle escape sequences: 1. $rawPart preserved \\ instead of resolving to \. Now resolved for escaped-backslash; other escapes (\T, \*) stay raw for code contexts (e.g. PHP namespaces). 2. The lexer's \`` catchable pattern (3 chars) swallows the closing backtick as part of an ESCAPED_SIGN token. The role never closes and rolls back. Now detected post-loop: the escape is consumed and a literal backtick becomes content. Narrowed to only the 3-char token; regular 2-char escapes at EOF correctly roll back. Resolves: TYPO3-Documentation/render-guides#1188
6be902c to
de35afa
Compare
|
Marking this as ready for review, but I'd appreciate feedback on the approach — I'm not fully sure how this project wants to handle these edge cases. Specific questions:
Happy to adjust the approach based on your guidance. |
|
Thank you for your work on this bugfix. Your contribution helped highlight some interesting edge cases and scenarios that needed verification, which was very valuable. After further investigation, I discovered that the issue was more complex and required a broader fix. I’ve since implemented and merged a comprehensive solution that addresses the root cause. You might want to verify the changes in #1312. I’ll go ahead and close this pull request, but I truly appreciate your effort and the insights your work provided. Looking forward to your future contributions! |
Summary
Fixes escape handling bugs in
TextRoleRulefor code-type text roles (:rst:,:php:,:code:, etc.).Resolves: TYPO3-Documentation/render-guides#1188
Bug 1: Escaped backslash stays literal in rawPart
Code-type text roles use
$rawContent(via #533). When the author writes\\to display a single backslash,$rawPartpreserved the literal\\instead of resolving it.RST input:
Before (broken):
After (fixed):
Other escapes like
\Tor\*are intentionally preserved raw in$rawPart— code contexts need literal backslash-letter sequences (e.g., PHP namespaces like\App\Entity).Bug 2: Escaped backtick swallows closing delimiter
The lexer has a catchable pattern that tokenizes a 3-char sequence (backslash + two backticks) as a single
ESCAPED_SIGNtoken. This swallows the closing backtick of the text role, soTextRoleRulenever finds aBACKTICKtoken to close — it rolls back and the role breaks entirely.RST input:
Before (broken): role is not recognized, rendered as literal text:
After (fixed): post-loop recovery detects the swallowed backtick. The 3-char token is split semantically:
The escape character is consumed, the backtick becomes content:
The recovery is narrowed to only the 3-char token — the only lexer pattern that swallows backticks. Regular 2-char escapes at end-of-input (like
\Tor\\without a closing backtick) correctly roll back as genuinely unterminated roles:RST input (genuinely unterminated — no closing backtick):
Result: rolls back, no role node produced (correct — the role was never closed).
Changes
TextRoleRule.php:\\→\in$rawPart(other escapes preserved raw)createTextRoleNode()helper to deduplicate role-building logic$lastEscapedToken)TextRoleRuleTest.php:\Tand\\at end → assert rollbackFunctional tests:
code-textrole-no-escape.htmlexpected outputcode-textrole-escape/test fixtureTest plan