docs(agents): add source-text-analysis rule (use parsers, not regex)#89
docs(agents): add source-text-analysis rule (use parsers, not regex)#89
Conversation
Two recent fixes converged on the same architectural answer for text inspection on Ricky source-code hot paths: - #86 (`auto-fix-loop.ts:hasRickyWorkflowAliasImport`) — substring match for `from 'node:fs'` was fooled by the literal text inside a HEREDOC embedded in a `.step({ command: ... })` body. Fixed by walking `ts.createSourceFile` ImportDeclaration nodes. - #88 (`spec-intake/markdown-target-files.ts:extractTargetFilesFromMarkdown`) — regex `PATH_PATTERN` matched paths inside fenced code blocks and prose noise. Fixed by walking `mdast-util-from-markdown` inlineCode nodes. Both fixes also independently rejected LLM-based detection for the same reasons: non-determinism breaks the offline eval suite, latency multiplies on every retry, and model output is a prompt-injection surface for paths that drive shell commands. Capture this as a durable rule in AGENTS.md so the next person tempted to grep-and-substitute on TS source has the prior art. Adds a new top-level "Ricky Source Code Conventions" section above the existing workflow-authoring conventions, since it applies to Ricky's own source rather than to the workflows Ricky generates. CLAUDE.md auto-follows via the existing symlink — no separate Claude rule needed (consistent with AGENTS.md:206). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| **Concrete tools available.** Both are already in `dependencies`: | ||
|
|
||
| - TypeScript / JavaScript: `import ts from 'typescript'`; `ts.createSourceFile(name, content, ts.ScriptTarget.Latest, false, ts.ScriptKind.TS)` then walk `sourceFile.statements`. | ||
| - Markdown: `import { fromMarkdown } from 'mdast-util-from-markdown'`; walk the resulting mdast tree (`inlineCode`, `code`, `heading`, `list`, etc.). |
There was a problem hiding this comment.
🟡 AGENTS.md falsely claims mdast-util-from-markdown is already in dependencies
Line 195-198 states "Concrete tools available. Both are already in dependencies" and then lists mdast-util-from-markdown as an available markdown parser. However, mdast-util-from-markdown is not present in package.json (neither in dependencies nor devDependencies) and is not installed in node_modules. Only typescript is actually available. Since AGENTS.md is a mandatory rule file that agents follow, this false claim will cause agents to write code importing fromMarkdown from mdast-util-from-markdown expecting it to resolve, leading to build/import failures.
| **Concrete tools available.** Both are already in `dependencies`: | |
| - TypeScript / JavaScript: `import ts from 'typescript'`; `ts.createSourceFile(name, content, ts.ScriptTarget.Latest, false, ts.ScriptKind.TS)` then walk `sourceFile.statements`. | |
| - Markdown: `import { fromMarkdown } from 'mdast-util-from-markdown'`; walk the resulting mdast tree (`inlineCode`, `code`, `heading`, `list`, etc.). | |
| **Concrete tools available.** TypeScript is already in `dependencies`; add `mdast-util-from-markdown` when a call site needs it: | |
| - TypeScript / JavaScript: `import ts from 'typescript'`; `ts.createSourceFile(name, content, ts.ScriptTarget.Latest, false, ts.ScriptKind.TS)` then walk `sourceFile.statements`. | |
| - Markdown: `import { fromMarkdown } from 'mdast-util-from-markdown'`; walk the resulting mdast tree (`inlineCode`, `code`, `heading`, `list`, etc.). Install with `npm install mdast-util-from-markdown` before first use. |
Was this helpful? React with 👍 or 👎 to provide feedback.
Ricky Eval ReviewRun: Passed: 5 | Needs human: 42 | Reviewable: 42 | Missing output: 0 | Failed: 0 | Skipped: 0 Human Review CasesThese cases passed deterministic checks and include captured Ricky output for a human verdict against their REVIEW
|
Summary
Codify the architectural rule that #86 and #88 both arrived at independently: when Ricky source code needs to inspect text whose grammar Ricky already understands (TS/JS, Markdown, JSON, shell), use a parser for that grammar — not substring match, not regex, not an LLM.
Why now
Two PRs landed in the same week with the same shape of bug:
auto-fix-loop.ts:hasRickyWorkflowAliasImportmarkdown-target-files.ts:extractTargetFilesFromMarkdownfrom 'node:fs'matched HEREDOC string-literal contentPATH_PATTERNmatched paths inside fenced code blocks + prose noisets.createSourceFile+ walkImportDeclarationnodesmdast-util-from-markdown+ walkinlineCodenodesBoth PRs also independently rejected LLM-based detection with the same reasoning (non-determinism, eval-suite breakage, hot-path latency, prompt-injection surface for paths that later drive shell commands). Worth capturing as a durable rule so the next person tempted to grep-and-substitute on TS source has the prior art.
What changed
AGENTS.md— new top-level# Ricky Source Code Conventionssection above the existing# Ricky Workflow Conventions. Holds one rule today (## Source-Text Analysis: Use Grammar-Aware Parsers, Not Regex) with reasoning, the rejected LLM alternative, available tooling (typescript,mdast-util-from-markdown), and a fallback ordering for when AST is genuinely too heavy.CLAUDE.md— auto-follows via the existing symlink, consistent with AGENTS.md:206 ("CLAUDE.mdshould remain a symlink toAGENTS.md. UpdateAGENTS.mdas the shared source of truth"). No separate Claude rule needed.Scope
Documentation-only. No code, dep, or test changes. The two referenced PRs (#86, #88) ship the actual code fixes; this PR captures the pattern those fixes share so we stop relearning it.
Test plan
readlink CLAUDE.md→AGENTS.md(symlink intact)head -180 CLAUDE.md | tail -10shows the new section through the symlink🤖 Generated with Claude Code