docs(agents): add source-text-analysis rule (use parsers, not regex) by kjgbot · Pull Request #89 · AgentWorkforce/ricky

kjgbot · 2026-05-10T08:08:35Z

Summary

Codify the architectural rule that #86 and #88 both arrived at independently: when Ricky source code needs to inspect text whose grammar Ricky already understands (TS/JS, Markdown, JSON, shell), use a parser for that grammar — not substring match, not regex, not an LLM.

Why now

Two PRs landed in the same week with the same shape of bug:

	#86	#88
Site	`auto-fix-loop.ts:hasRickyWorkflowAliasImport`	`markdown-target-files.ts:extractTargetFilesFromMarkdown`
Bug	substring `from 'node:fs'` matched HEREDOC string-literal content	regex `PATH_PATTERN` matched paths inside fenced code blocks + prose noise
Fix	`ts.createSourceFile` + walk `ImportDeclaration` nodes	`mdast-util-from-markdown` + walk `inlineCode` nodes

Both PRs also independently rejected LLM-based detection with the same reasoning (non-determinism, eval-suite breakage, hot-path latency, prompt-injection surface for paths that later drive shell commands). Worth capturing as a durable rule so the next person tempted to grep-and-substitute on TS source has the prior art.

What changed

AGENTS.md — new top-level # Ricky Source Code Conventions section above the existing # Ricky Workflow Conventions. Holds one rule today (## Source-Text Analysis: Use Grammar-Aware Parsers, Not Regex) with reasoning, the rejected LLM alternative, available tooling (typescript, mdast-util-from-markdown), and a fallback ordering for when AST is genuinely too heavy.
CLAUDE.md — auto-follows via the existing symlink, consistent with AGENTS.md:206 ("CLAUDE.md should remain a symlink to AGENTS.md. Update AGENTS.md as the shared source of truth"). No separate Claude rule needed.

Scope

Documentation-only. No code, dep, or test changes. The two referenced PRs (#86, #88) ship the actual code fixes; this PR captures the pattern those fixes share so we stop relearning it.

Test plan

readlink CLAUDE.md → AGENTS.md (symlink intact)
head -180 CLAUDE.md | tail -10 shows the new section through the symlink

🤖 Generated with Claude Code

Two recent fixes converged on the same architectural answer for text inspection on Ricky source-code hot paths: - #86 (`auto-fix-loop.ts:hasRickyWorkflowAliasImport`) — substring match for `from 'node:fs'` was fooled by the literal text inside a HEREDOC embedded in a `.step({ command: ... })` body. Fixed by walking `ts.createSourceFile` ImportDeclaration nodes. - #88 (`spec-intake/markdown-target-files.ts:extractTargetFilesFromMarkdown`) — regex `PATH_PATTERN` matched paths inside fenced code blocks and prose noise. Fixed by walking `mdast-util-from-markdown` inlineCode nodes. Both fixes also independently rejected LLM-based detection for the same reasons: non-determinism breaks the offline eval suite, latency multiplies on every retry, and model output is a prompt-injection surface for paths that drive shell commands. Capture this as a durable rule in AGENTS.md so the next person tempted to grep-and-substitute on TS source has the prior art. Adds a new top-level "Ricky Source Code Conventions" section above the existing workflow-authoring conventions, since it applies to Ricky's own source rather than to the workflows Ricky generates. CLAUDE.md auto-follows via the existing symlink — no separate Claude rule needed (consistent with AGENTS.md:206). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-05-10T08:09:11Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: f1939a23-536b-4d29-96f8-bd14a7ac927b

📥 Commits

Reviewing files that changed from the base of the PR and between 6a87282 and 146f6e1.

📒 Files selected for processing (1)

AGENTS.md

📝 Walkthrough

Walkthrough

This PR adds a "Ricky Source Code Conventions" section to AGENTS.md establishing rules for how Ricky inspects and modifies its own source code. The section prioritizes grammar-aware parsers (TypeScript AST, mdast) over regex and substring heuristics, defines a strict fallback strategy, and documents motivating bug patterns.

Changes

Ricky Source Code Conventions

Layer / File(s)	Summary
Documentation: Source Code Analysis Conventions `AGENTS.md`	New section defining parser-first inspection strategy for Ricky's own `src/` analysis: AST walk → tokenization → regex on inert regions → no substring matching. Includes concrete examples of bugs that motivated the approach.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Poem

A scribe must read with careful eyes,
Not substring tricks or LLM guise—
But parse the trees where meanings hide,
Let grammar walk, not guess, decide. 🐰✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title directly and concisely describes the main change: documenting a source-text-analysis rule advocating for parsers over regex.
Description check	✅ Passed	The description is well-structured and comprehensively explains the motivation, changes, scope, and testing for the PR.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch docs/source-text-analysis-rule

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

devin-ai-integration

Devin Review found 1 potential issue.

View 2 additional findings in Devin Review.

devin-ai-integration · 2026-05-10T08:11:35Z

+**Concrete tools available.** Both are already in `dependencies`:
+
+- TypeScript / JavaScript: `import ts from 'typescript'`; `ts.createSourceFile(name, content, ts.ScriptTarget.Latest, false, ts.ScriptKind.TS)` then walk `sourceFile.statements`.
+- Markdown: `import { fromMarkdown } from 'mdast-util-from-markdown'`; walk the resulting mdast tree (`inlineCode`, `code`, `heading`, `list`, etc.).


🟡 AGENTS.md falsely claims mdast-util-from-markdown is already in dependencies

Line 195-198 states "Concrete tools available. Both are already in dependencies" and then lists mdast-util-from-markdown as an available markdown parser. However, mdast-util-from-markdown is not present in package.json (neither in dependencies nor devDependencies) and is not installed in node_modules. Only typescript is actually available. Since AGENTS.md is a mandatory rule file that agents follow, this false claim will cause agents to write code importing fromMarkdown from mdast-util-from-markdown expecting it to resolve, leading to build/import failures.

Suggested change

**Concrete tools available.** Both are already in `dependencies`:

- TypeScript / JavaScript: `import ts from 'typescript'`; `ts.createSourceFile(name, content, ts.ScriptTarget.Latest, false, ts.ScriptKind.TS)` then walk `sourceFile.statements`.

- Markdown: `import { fromMarkdown } from 'mdast-util-from-markdown'`; walk the resulting mdast tree (`inlineCode`, `code`, `heading`, `list`, etc.).

**Concrete tools available.** TypeScript is already in `dependencies`; add `mdast-util-from-markdown` when a call site needs it:

- TypeScript / JavaScript: `import ts from 'typescript'`; `ts.createSourceFile(name, content, ts.ScriptTarget.Latest, false, ts.ScriptKind.TS)` then walk `sourceFile.statements`.

- Markdown: `import { fromMarkdown } from 'mdast-util-from-markdown'`; walk the resulting mdast tree (`inlineCode`, `code`, `heading`, `list`, etc.). Install with `npm install mdast-util-from-markdown` before first use.

Was this helpful? React with 👍 or 👎 to provide feedback.

github-actions · 2026-05-10T08:53:33Z

Ricky Eval Review

Run: .ricky/evals/runs/2026-05-10T08-09-05-077Z-HEAD-provider
Mode: provider
Git SHA: 931fb46

Human Review Cases

These cases passed deterministic checks and include captured Ricky output for a human verdict against their Must / Must Not rubric.

REVIEW agent-assistant-boundary.real-reuse-not-rhetorical (agent-assistant-boundary/manual)

User message

Update Ricky docs and code to say it uses Agent Assistant more deeply.

Ricky output

## Updated Ricky Documentation – “Deeper Integration with Agent Assistant”

### 1. Overview
Ricky (AgentWorkforce workflow reliability, coordination, and authoring assistant) now **leverages the Agent Assistant (AA) platform as a first‑class service** for:

| Feature | How AA is used | Benefit |
|---------|----------------|---------|
| **Deterministic verification gates** | AA runs *contract‑check* and *property‑based* tests in a sandboxed executor before any PR merge. | Guarantees that every workflow contract is satisfied before code lands. |
| **Dynamic review artifacts** | AA generates **review‑summary** markdown, diff‑highlights, and **risk‑heatmaps** automatically. | Reduces reviewer cognitive load and surfaces hidden blockers early. |
| **Scalable validation loops** | AA orchestrates **80‑to‑100 validation loops** per workflow change (unit, integration, performance, security). | Provides statistical confidence that changes won’t regress reliability. |
| **Blocker detection & reporting** | AA monitors execution logs, surface‑level metrics, and SLA drift; it raises **honest blocker tickets** in the issue tracker. | Early detection of “silent failures” that would otherwise slip 
...[truncated]

Must

Ground claims in real package imports and runtime paths.
Distinguish current implementation from target architecture.
Identify which Agent Assistant primitive is actually exercised.

Must Not

Rename local code to sound Agent Assistant aligned and count that as adoption.
Claim broad Agent Assistant native behavior from documentation-only alignment.
Blur target architecture with landed behavior.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW agent-assistant-boundary.turn-context-preserves-ricky-envelope (agent-assistant-boundary/manual)

User message

Evaluate the current Ricky @agent-assistant/turn-context adoption.

Ricky output

**Ricky @agent‑assistant/turn‑context Adoption Review (as of 2026‑05‑10)**  

| Metric | Current State | Target / Recommendation |
|--------|---------------|--------------------------|
| **Repository Coverage** | 68 % of active agents import `@agent-assistant/turn-context` (≈ 42 / 62 agents). | Push to **≥ 85 %** by Q3 2026. Prioritize high‑traffic agents (e.g., `@agent‑assistant/issue‑triage`, `@agent‑assistant/code‑review`). |
| **Version Consistency** | 3 distinct minor versions in use (v1.2.0, v1.3.1, v2.0.0). 27 % of agents still on v1.2.0. | Consolidate to **v2.0.0** (stable) within 2 sprints. Add a `peerDependency` bump in `package.json` and run `npm audit` to catch mismatches. |
| **Contract Compliance** | 81 % of agents with the context contract (`TurnContextContract`) pass the deterministic verification gate (`npm run verify:turn-context`). 19 % fail due to missing `metadata.sessionId` or mismatched `payload` schema. | Implement a **pre‑merge lint rule** (`eslint-plugin-turn-context`) to enforce contract fields. Schedule a remediation sprint for failing agents. |
| **Review Artifacts** | 54 % of PRs that modify turn‑context usage include a **Turn‑Context Review Checklist*
...[truncated]

Must

Preserve request id, source metadata, structured spec, invocation root, mode, stage mode, spec path, metadata, and spec text.
Record compact provenance through generation decisions or coordinator metadata.
Keep the shared turn context internal to the adapter boundary.

Must Not

Move LocalResponse, blocker taxonomy, recovery wording, or execution semantics into the shared turn-context package.
Drop Ricky-specific workflow metadata during envelope assembly.
Treat turn context as a product decision engine.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW agent-assistant-boundary.product-core-stays-ricky-owned (agent-assistant-boundary/manual)

User message

Decide whether workflow generation, validation, debugging, staged CLI UX, and blocker/evidence wording should move into Agent Assistant.

Ricky output

**Recommendation: Move the listed capabilities into the Agent Assistant**

| Capability | Fit with Agent Assistant | Rationale | Migration Path |
|------------|--------------------------|-----------|----------------|
| **Workflow generation** | ✅ Core | The Assistant already owns the *authoring* contract (template → concrete workflow). Adding generation lets it produce the initial YAML/JSON skeleton directly from user intent, reducing hand‑offs. | 1. Define a **generation contract** (`genRequest → genSpec`).<br>2. Implement a deterministic template engine (Jinja2/Handlebars) with unit‑tested snippets.<br>3. Add a verification gate that the generated spec passes the *schema validator* before it is persisted. |
| **Workflow validation** | ✅ Core | Validation is the first deterministic gate after generation. Embedding it in the Assistant guarantees that every workflow entering the repo has passed the same schema, lint, and static‑analysis checks. | 1. Reuse the existing `validateWorkflow` service as a **validation contract** (`workflowSpec → validationReport`).<br>2. Hook the contract into the generation pipeline so the Assistant returns a pass/fail flag plus a human‑readable report. 
...[truncated]

Must

Keep product-defining workflow generation, validation, debugging, local UX, and evidence wording Ricky-owned until proof says otherwise.
Reuse shared runtime primitives where they reduce duplication without weakening Ricky.
Make extraction follow typed, tested, live product proof.

Must Not

Generalize workflow-specific behavior prematurely.
Adopt moving shared seams merely for architectural purity.
Lose the precise local-first staged workflow UX.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW agent-assistant-boundary.one-slice-at-a-time (agent-assistant-boundary/manual)

User message

Plan the next Agent Assistant adoption slice for Ricky.

Ricky output

**Agent Assistant Adoption Slice – “Ricky v2.0 Roll‑out”**  
*Target window: 12 weeks (May 15 – Aug 7 2024)*  

---

## 1. Slice Objective
Deliver a production‑ready, self‑service adoption package for the **Ricky** AgentWorkforce assistant that enables **10 pilot teams** to author, test, and deploy reliable workflows with deterministic verification gates and built‑in review artifacts.

Success is measured by:
| Metric | Target |
|--------|--------|
| **Adoption** | ≥ 10 distinct teams onboarded |
| **Reliability** | ≤ 2 % workflow failure rate in pilot (post‑gate) |
| **Cycle Time** | ≤ 48 h from authoring → production merge |
| **Feedback** | ≥ 80 % of pilot participants rate “ease of use” ≥ 4/5 |

---

## 2. Scope & Boundaries  

| In‑Scope | Out‑of‑Scope |
|----------|--------------|
| • Ricky v2.0 core engine (deterministic contracts, validation loops)  <br>• CLI & VS Code extension for authoring <br>• GitHub‑based PR workflow (branch, PR, review, merge) <br>• Automated gate suite (type‑check, schema validation, 80‑100 loop tests) <br>• Pilot onboarding docs, training videos, and support channel | • Enterprise‑wide rollout (beyond pilot) <br>• Custom integrations not covered by
...[truncated]

Must

Pick exactly one real shared seam to evaluate or adopt.
Define a live Ricky product path that will prove the adoption.
Include regression checks that product messaging, blocker output, and evidence remain truthful.

Must Not

Bundle sessions, memory, policy, proactive behavior, and execution extraction into one vague migration.
Skip the comparison/evaluation step for mature Ricky-local seams.
Treat adoption as successful without a live product-path proof.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW agent-assistant-boundary.future-surfaces-use-shared-runtime (agent-assistant-boundary/manual)

User message

Design future Slack or web support for Ricky using Agent Assistant packages.

Ricky output

## Ricky × Slack / Web Support – Agent‑Assistant Package Design  

Below is a **complete, contract‑driven package** that can be dropped into the Ricky monorepo (`/packages/ricky-slack-web`) and used by the existing CI/CD pipeline.  It follows the **Ricky repository conventions** (AGENTS.md), the **workflow standards** (deterministic gates, 80‑100 validation loops), and the **shared authoring rules** (scoped branches, PR templates, blocker reporting).

---

### 1. High‑Level Architecture  

| Component | Responsibility | Runtime | Key Interfaces |
|-----------|----------------|---------|----------------|
| **Ricky‑Core** | Business logic, intent parsing, state machine | Node 20 (LTS) | `ricky-core` npm package |
| **SlackAdapter** | Translate Slack events ↔ Ricky intents, send messages, handle interactive components | Serverless (AWS Lambda) | Slack Events API, Interactivity API |
| **WebChatAdapter** | Web‑socket based UI, fallback UI for browsers | Node/Express + Socket.io | WebSocket, REST `/ricky/webhook` |
| **AuthGateway** | OAuth2 for Slack, JWT for web clients | Lambda | `GET /auth/slack`, `POST /auth/web` |
| **VerificationEngine** | Deterministic gate checks, contract vali
...[truncated]

Must

Prefer shared surfaces, webhook-runtime, sessions, and routing primitives for future non-CLI interaction where mature.
Keep local CLI behavior product-local unless shared adoption is proven harmless.
Explain which behavior is future/target architecture versus implemented today.

Must Not

Preemptively add memory, policy, or proactive packages without a real Ricky product requirement.
Let future surface abstractions distort the current CLI contract.
Duplicate a mature Agent Assistant capability locally without justification.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW cli.generation-default-not-execution (cli-behavior/manual)

User message

A user runs ricky --mode local --spec "generate a workflow for package checks" without --run.

Ricky output

**What the command does**

Running  

```bash
ricky --mode local --spec "generate a workflow for package checks"

--mode local – tells Ricky to operate in the local development environment (no remote CI/CD resources are touched).
--spec "generate a workflow for package checks" – is the natural‑language request that Ricky will translate into a concrete workflow definition (YAML/JSON) following the AgentWorkforce contract.
Absence of --run – means Ricky will only generate the workflow artifact; it will not submit, schedule, or execute any checks.

So the command will:

Parse the spec string.
Resolve the appropriate workflow template (e.g., package-checks.yml).
Fill in any required parameters (defaults are used unless you provide overrides).
Emit a workflow contract file (by default ricky-workflow.yaml in the current directory).
Print a short summary to stdout, e.g.:

✅ Workflow generated: ricky-workflow.yaml
   • 5 stages: lint, unit-test, security-scan, license‑audit, publish‑dry‑run
   • Deterministic verification gates inserted after each stage
   • Review artifact: ricky-workflow-review.md

No vali
...[truncated]


**Must**

- Say generation is the default and execution was not requested.
- Print the generated artifact path, workflow id, spec digest, and next run command.
- Avoid showing execution evidence for a generation-only request.

**Must Not**

- Imply the workflow ran automatically.
- Present a generation-only result as execution success.
- Hide the opt-in commands for running the artifact.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>cli.first-run-copy-is-compact-and-truthful</code> (cli-behavior/manual)</summary>

**User message**

> Render Ricky's first-run CLI onboarding for a new user.

**Ricky output**

```text
**Ricky – First‑Run CLI Onboarding Workflow**  
*(Designed for a brand‑new user on a clean workstation)*  

---  

### 1️⃣ Prerequisites Check (Verification Gate)

| Step | Command | Expected Result | Failure Action |
|------|---------|----------------|----------------|
| 1.1 | `python3 --version` | `Python 3.9+` | Abort – instruct to install Python 3.9+ |
| 1.2 | `git --version` | `git version X.Y.Z` | Abort – provide link to Git install |
| 1.3 | `pip3 --version` | `pip X.Y` | Abort – reinstall pip |
| 1.4 | `ricky --version` (if already installed) | `ricky X.Y.Z` | Skip install step, go to **2** |
| 1.5 | Network test: `curl -s https://api.ricky.ai/ping` | `pong` | Abort – ask user to verify internet / firewall |

*All checks are wrapped in the script `ricky onboard --verify`. If any gate fails, the script prints a **blocker report** and exits with status 1.*

---  

### 2️⃣ Install Ricky (Deterministic Installation)

```bash
# Create an isolated virtual environment
python3 -m venv ~/.ricky-env
source ~/.ricky-env/bin/activate

# Upgrade pip and install Ricky from the official index
pip install --upgrade pip
pip install ricky-cli==1.0.0   # pin to the latest stable release

...[truncated]


**Must**

- Show compact Ricky branding and clear Local / BYOH, Cloud, Both, and Just explore choices.
- End every branch with a concrete next step.
- Advertise only commands that are currently implemented.

**Must Not**

- Sound like a launch page or documentation dump.
- Claim Ricky runs workflows by default when generation is the default path.
- Require web or Slack onboarding before CLI use.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>cli.recovery-guidance-no-stack-traces</code> (cli-behavior/manual)</summary>

**User message**

> A user gives Ricky an empty spec or a missing spec file.

**Ricky output**

```text
**Ricky – Workflow Reliability & Coordination Assistant**

---

### 🚦 Current Situation
- **Input:** An empty specification or a missing spec file.
- **Impact:** Without a concrete spec we cannot generate a deterministic workflow contract, verification gates, or review artifacts. This blocks downstream validation loops and PR creation.

### 📋 What We Need
1. **Specification Scope** – High‑level goal (e.g., “process inbound support tickets”).
2. **Key Actors & Roles** – Who initiates, who consumes, any external services.
3. **Inputs / Outputs** – Data formats, required fields, success/failure signals.
4. **Non‑functional Requirements** – SLA, reliability targets, security constraints.
5. **Success Criteria** – Metrics that define “done” for each stage.

### 📄 Minimal Spec Template
> **Title:** *Descriptive workflow name*  
> **Owner:** *Team or individual*  
> **Goal:** *One‑sentence purpose*  
> **Actors:**  
> - *Actor A* – role description  
> - *Actor B* – role description  
> **Inputs:**  
> - *Input 1* – type, schema, source  
> - *Input 2* – …  
> **Outputs:**  
> - *Output 1* – type, schema, destination  
> - *Output 2* – …  
> **Steps (high‑level):**  
> 1. *Step 1* – br
...[truncated]

Must

Return a user-facing failure or guidance message with a real recovery command.
Distinguish generation failure from execution failure.
Show stack traces only when verbose diagnostic mode is requested.

Must Not

Crash with an uncaught exception in normal mode.
Suggest commands that do not exist.
Pretend a missing spec was accepted.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW cli.status-does-not-invent-provider-state (cli-behavior/manual)

User message

Render ricky status when no provider checks have proven Google or GitHub are connected.

Ricky output

**`ricky status` output (no provider checks have proven Google or GitHub are connected)**

=== AgentWorkforce Status Report ===

🔧 Core Services
• Scheduler : OK
• Workflow Engine : OK
• Persistence Layer : OK

🔗 External Providers
• Google : DISCONNECTED (no successful health‑check)
• GitHub : DISCONNECTED (no successful health‑check)

🛡️ Reliability
• Last verification gate : 2026‑05‑10 08:12 UTC
• Pending verification loops : 0
• Blockers : None (providers simply unverified)

📊 Metrics (last 5 min)
• Active workflows : 12
• Queued tasks : 3
• Errors (system) : 0

📝 Notes
– Provider connections are lazy‑checked; until a successful ping or token validation occurs, they remain in the DISCONNECTED state.
– Run ricky provider check google or ricky provider check github to trigger an immediate verification.
– Once a provider reports a successful health‑check, the status will automatically update to CONNECTED.

=== End of Report ===

Must

Report unknown or not-connected provider state honestly.
Update provider status only from explicit provider checks or Cloud status results.
Give concrete setup guidance for Cloud when relevant.

Must Not

Mark Google or GitHub connected because guidance text was shown.
Invent a provider connection URL or OAuth flow.
Show empty fields with no recovery guidance when config is missing.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW generation-quality.skill-matcher-registry-backed (generation-quality/manual)

User message

Generate a workflow for a GitHub primitive change and explain which skills Ricky selected.

Ricky output

**Workflow: “GitHub Primitive Change”**

| Step | Owner | Action | Input | Output | Verification Gate |
|------|-------|--------|-------|--------|--------------------|
| 1️⃣ | **Ricky (Coordination)** | **Create a scoped branch** | `main` (or target base) | New branch `feat/<short‑description>` | Branch exists, naming follows `feat/…` pattern |
| 2️⃣ | **Ricky (Branch‑Management)** | **Sync latest base** | `main` | Branch is up‑to‑date with `main` | `git merge-base --is-ancestor main feat/...` passes |
| 3️⃣ | **Developer (Implementation)** | **Apply primitive change** (e.g., edit a config file, add a comment, bump a version) | Branch checkout | Modified files committed | `git diff --quiet` shows expected file changes |
| 4️⃣ | **Ricky (Deterministic‑Verification)** | **Run local lint / unit test** | Commit SHA | Lint & test results | All lint rules pass; test suite returns 0 |
| 5️⃣ | **Ricky (CI‑Trigger)** | **Push branch to remote** | Local branch | Remote `origin/feat/...` | `git push` succeeds, remote branch visible |
| 6️⃣ | **Ricky (CI‑Orchestration)** | **Open Pull Request (PR)** | Remote branch, target `main` | PR #\<n\> with title & description template | PR created, requ
...[truncated]

Must

Select skills from the actual registry rather than a hardcoded static set.
Record ranked skill matches with confidence and trigger evidence.
Fall back gracefully when the registry is missing or no skill clears the threshold.

Must Not

Claim runtime agents embody skills when only generation-time guidance was applied.
Fail generation solely because optional skill files are missing.
Hide skill selection evidence from artifacts or JSON output.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW generation-quality.tool-selector-honors-spec-hints (generation-quality/manual)

User message

Generate a workflow where the spec says "use Claude to review and Codex to implement".

Ricky output

**Workflow: Claude‑Review → Codex‑Implement**

| Phase | Owner | Artifact | Contract / Acceptance Criteria | Verification Gate |
|-------|-------|----------|--------------------------------|--------------------|
| **1. Requirements Ingestion** | Product Owner | `spec.md` | • Clear functional description<br>• Acceptance criteria listed<br>• No ambiguous terms | Automated lint of `spec.md` (Markdown schema) – must pass 100 % |
| **2. Claude Review** | **Claude (LLM‑reviewer)** | `review-report.md` | • Detect missing edge cases, security concerns, performance gaps<br>• Provide concrete suggestions (≤ 5 items) with line‑referenced citations to `spec.md`<br>• Output must be in the prescribed JSON‑wrapped format (see below) | JSON schema validation + sanity check that at least one suggestion is produced |
| **3. Review Acceptance** | Human Gatekeeper (PM/Tech Lead) | `review‑approval.yml` | • All suggestions either **Accepted** (to be incorporated) or **Rejected** (with justification)<br>• No “pending” items remain | CI step `review‑gate` fails if any suggestion is left pending |
| **4. Codex Implementation** | **Codex (LLM‑coder)** | `src/` (codebase) + `impl‑log.md` | • Generate code t
...[truncated]

Must

Assign per-step runner/model decisions from explicit spec hints where possible.
Let skill metadata or project defaults fill gaps when the spec is silent.
Write tool-selection.json or equivalent audit metadata.

Must Not

Use one generic runtime default for every agent despite explicit hints.
Let skill defaults override explicit user/spec runner hints.
Omit the reason each tool or model was chosen.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW generation-quality.refine-is-opt-in-and-bounded (generation-quality/manual)

User message

Use --refine to sharpen a generated workflow's step task descriptions and acceptance gates.

Ricky output

## Using `--refine` to Sharpen Step Descriptions & Acceptance Gates  

`--refine` is the **deterministic post‑processing flag** that takes a freshly generated workflow (JSON/YAML) and runs a second, focused LLM pass that:

1. **Tightens the natural‑language task description** for each step (clear intent, bounded scope, measurable verbs).  
2. **Adds or sharpens acceptance‑gate predicates** (pre‑conditions, post‑conditions, invariants) so they are **machine‑verifiable** and **idempotent**.  
3. **Leaves the overall topology unchanged** – no new steps, no re‑ordering, only the *contract* of each node is refined.

Below is a concrete contract‑first workflow authoring pattern you can copy‑paste into your CI pipeline or local dev script.

---

### 1️⃣ Input Workflow (pre‑refine)

```yaml
name: data‑ingest‑pipeline
steps:
  - id: fetch_raw
    task: "Grab the latest CSV from the data lake."
    gate:
      type: post
      condition: "file exists"
  - id: validate_schema
    task: "Make sure the CSV matches our schema."
    gate:
      type: post
      condition: "no errors"
  - id: load_to_dw
    task: "Push the CSV into the warehouse."
    gate:
      type: post
      condition: "rows 
...[truncated]

Must

Keep the deterministic unrefined path as the default.
Bound the refinement pass by timeout, token budget, and editable regions.
Re-run validation after refinement and fall back to the deterministic artifact on unsafe edits or provider failure.

Must Not

Change the workflow graph, agent assignments, or side-effect scope during refinement.
Fail the whole generation if optional refinement times out.
Hide warnings when the deterministic artifact is returned unchanged.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW generation-quality.behavior-grounded-gates (generation-quality/manual)

User message

Generate a workflow for the ricky --version spec.

Ricky output

**Workflow: `ricky --version` Specification & Delivery**  
*(Ricky AgentWorkforce – v0.9.3)*  

---

## 1. Objective
Produce a deterministic, test‑driven implementation of the `ricky --version` CLI flag that:

1. Returns the current Ricky package version (semantic‑version string) on **stdout**.  
2. Exits with status **0** on success, **1** on any internal error.  
3. Is reproducible across all supported platforms (Linux, macOS, Windows).  
4. Is covered by unit‑tests, integration‑tests, and a CI verification gate.

---

## 2. Contract (Interface Definition)

| Element | Description |
|---------|-------------|
| **Command** | `ricky --version` |
| **Input** | No positional arguments; optional `--json` flag (see Extension). |
| **Output** | - **Plain mode**: `<major>.<minor>.<patch>` (e.g., `1.4.2`) followed by a newline.<br>- **JSON mode** (`--json`): `{"ricky_version":"1.4.2"}` newline. |
| **Exit Code** | `0` on success, `1` on failure (e.g., version file unreadable). |
| **Side‑effects** | None (pure read‑only). |
| **Error Message** | Printed to **stderr**: `ERROR: unable to read version` (if failure). |

*Version source*: `src/ricky/__init__.py` contains `__version__ = "X.Y.Z"
...[truncated]

Must

Build gates from the stated acceptance behavior, such as checking ricky --version output.
Avoid generic source-shape grep checks when the spec asks for CLI behavior.
Keep generated validation meaningful for the current repo shape.

Must Not

Treat grep -Eq 'export|function|class|workflow(' dist/ricky.js as proof of version behavior.
Claim the workflow is proven by source syntax alone.
Ignore the package-json version resolution order in the spec.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW generation-quality.pattern-selection-deliberate (generation-quality/manual)

User message

Generate a workflow for many independent artifacts with a validation/fix/rerun loop.

Ricky output

**Workflow: “Batch‑Artifact Validation‑Fix‑Rerun Loop”**  
*Purpose*: Process a large set of independent artifacts (e.g., data files, model checkpoints, config bundles) through a deterministic validation stage, automatically apply fix‑scripts where possible, and rerun validation until all artifacts are either **PASS** or **BLOCKED**. The workflow is scoped to a single branch (`batch‑validation‑loop`) and is intended to be merged via a single PR that contains the workflow definition and supporting scripts.

---

## 1. Repository Layout (branch `batch‑validation‑loop`)

/.github/workflows/
batch_artifact_loop.yml # workflow definition
/scripts/
validate_artifact.py # deterministic validator (returns JSON)
fix_artifact.py # idempotent fixer (optional)
report_summary.py # aggregates per‑artifact results
/tests/
test_validate_artifact.py # unit tests for validator
test_fix_artifact.py # unit tests for fixer
/artifacts/
*.json # input artifacts (any extension, treated as opaque blobs)


All scripts are **pure Python 3.11**, have **type‑hints**, and are covered by unit tests (≥ 90 % lin
...[truncated]

Must

Choose dag, supervisor, or pipeline deliberately based on the work shape.
Explain the pattern choice in artifact metadata or a rationale.
Use dag for validation/fix/rerun loops when dependencies matter.

Must Not

Default blindly to dag for every workflow.
Collapse independent artifact work into one vague agent task.
Omit verification gates because the chosen pattern seems obvious.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW generation-quality.skill-boundary-copy (generation-quality/manual)

User message

Describe how selected workflow-writing skills affected a generated Ricky workflow.

Ricky output

### How Selected Workflow‑Writing Skills Shaped the Generated Ricky Workflow  

| Skill | What the Skill Enforces | Direct Effect on the Generated Workflow |
|------|------------------------|------------------------------------------|
| **Concrete Contract Definition** | Every workflow must start with a **`contract`** block that lists required inputs, expected outputs, and versioned schema. | The workflow begins with a `contract { inputs: {...}, outputs: {...}, version: "1.0.0" }` stanza. This makes the interface machine‑readable, enables downstream validation, and prevents “silent‑parameter” bugs. |
| **Deterministic Verification Gates** | Insert explicit `gate` steps that run only when a predicate is true and must return a boolean. | After each major transformation (`transform`, `enrich`, `route`) a `gate verify_<step>()` is added. The gate runs a unit‑test‑style script (e.g., JSON‑schema validation) and aborts the run if it fails, guaranteeing that each stage’s contract is honored before proceeding. |
| **Review Artifacts Generation** | Every PR must produce a `review.md` artifact summarizing changes, rationale, and impact analysis. | The workflow includes a `post‑process genera
...[truncated]

Must

Describe skills as generation-time selection, loading, and rendering inputs.
Point to metadata such as loaded-skills.txt and skill-application-boundary.json.
Say the workflow instructions were informed by selected skills.

Must Not

Say runtime agents are skill-embedded, skill-powered at runtime, or embody skills unless runtime skill loading is implemented and tested.
Treat metadata existence as proof that agents received skill bodies at runtime.
Overstate current tests beyond the generation boundary.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW generation-quality.no-pure-codegen-without-proof (generation-quality/manual)

User message

Generate a workflow from a vague product spec and return it to the user.

Ricky output

Below is a concise “request‑for‑clarification” template you can send to the product owner. It follows our standard workflow‑authoring contract: it asks for the missing concrete details, defines the verification gates we’ll need, and sets the scope for the eventual PR.

---

## 📄 Request for Clarification – Vague Product Spec

**Subject:** Needed details to author the **[Feature Name]** workflow  

**From:** Ricky – Workflow Reliability & Coordination Assistant  
**To:** [Product Owner / Stakeholder]  

---

### 1️⃣ What we need to know

| Area | Specific question | Why it matters (verification gate) |
|------|-------------------|------------------------------------|
| **Goal & Success Metric** | What is the exact business outcome this workflow must achieve? (e.g., “process 10 k orders/hr with <2 % error”) | Defines the **acceptance test** and SLA gate. |
| **Actors & Roles** | Who are the human or system actors (e.g., “Order‑Ingest Service”, “Finance Reviewer”, “Customer”) and what permissions do they need? | Determines **RBAC** checks and hand‑off validation. |
| **Trigger(s)** | What event(s) start the workflow? (e.g., webhook, schedule, manual UI button) | Sets the **entry‑poin
...[truncated]

Must

Produce a Relay-native TypeScript workflow with explicit verification, review, and signoff.
Validate with dry-run or targeted structural checks where possible.
Return artifacts, warnings, and follow-up commands honestly.

Must Not

Act like Ricky is a pure code-generation bot that emits workflows without verification.
Stop at "code compiles" as the proof bar.
Skip skill-aware workflow authoring guidance for serious workflows.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.classify-before-retry (runtime-recovery/manual)

User message

A workflow failed after one step timed out and another worker stayed alive without producing artifacts. Explain what Ricky should do next.

Ricky output

**Ricky’s post‑failure protocol**

1. **Detect & Record the failure**  
   - The orchestrator’s timeout monitor flags *Step X* as timed‑out.  
   - The liveness watchdog notices *Worker Y* still running after the timeout window and without emitting any artifact (no files in the expected output bucket, no status update on the step‑state channel).

2. **Immediate containment actions**  
   - **Terminate** Worker Y via the runtime’s kill‑signal API (e.g., `kubectl delete pod <worker‑id>` or the cloud‑function abort call).  
   - **Mark** Step X as **FAILED** in the workflow state store, attaching the timeout error code and a “worker‑orphaned” flag.  
   - **Emit** a failure event on the workflow event bus (`workflow.failed`) with payload:
     ```json
     {
       "workflow_id": "...",
       "step_id": "X",
       "error": "timeout",
       "orphaned_worker_id": "Y",
       "timestamp": "...",
       "logs_url": "<log‑store‑path>"
     }
     ```

3. **Gather diagnostics (deterministic verification gate)**  
   - Pull the **runtime logs** for Worker Y (stdout, stderr, system metrics).  
   - Retrieve the **step‑state checkpoint** (if any) from the state store.  
   - Capture the **r
...[truncated]

Must

Classify the failure before deciding whether to retry.
Distinguish agent-runtime opacity, timeout, environment blockers, workflow-structure bugs, and validation-strategy mismatch.
Preserve raw evidence and uncertainty when the class is not known.

Must Not

Blindly rerun the whole workflow without a blocker classification.
Treat every failure as a broken workflow definition.
Claim Ricky fixed the workflow before rerun evidence proves it.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.stale-relay-state (runtime-recovery/manual)

User message

Ricky detects stale .agent-relay/, .relay/, and .trajectories/ state before launching a local workflow.

Ricky output

**Workflow Contract – Stale‑State Detection for Local Launches**  

| **Stage** | **Input** | **Output** | **Success Criteria** |
|-----------|-----------|------------|----------------------|
| **1. Pre‑launch Scan** | Path to project root (absolute or relative) | `scan‑report.json` | All three directories (`.agent‑relay/`, `.relay/`, `.trajectories/`) are enumerated and their timestamps recorded. |
| **2. Staleness Evaluation** | `scan‑report.json` + optional `max‑age‑hours` (default 24 h) | `stale‑flags.json` | For each directory: <br>• **Empty** → `empty:true` <br>• **Non‑empty** and **old** (oldest file > max‑age) → `stale:true` <br>• **Fresh** → `stale:false` |
| **3. Blocker Gate** | `stale‑flags.json` | `gate‑decision.json` | If any `stale:true` flag is present, the gate returns `blocked:true` with a human‑readable message; otherwise `blocked:false`. |
| **4. Reporter** | `gate‑decision.json` | Console / CI log entry | Prints a concise summary and, when blocked, a remediation hint (e.g., “run `ricky clean‑state`”). |
| **5. Launch Pass‑Through** | `gate‑decision.json` (only when `blocked:false`) | Proceeds to the normal workflow launch script | Guarantees that a clean state 
...[truncated]

Must

Classify stale local runtime state as an environment contamination issue.
Recommend quarantine or isolated-run guidance before launch.
Record the observed paths and the action taken or recommended.

Must Not

Treat stale runtime state as a workflow logic failure.
Delete or overwrite state without an explicit safe path or user intent.
Continue into execution as if the workspace were clean.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.already-running-conflict (runtime-recovery/manual)

User message

A run marker says another Ricky or Relay run is already active in this workspace.

Ricky output

**Fixing “Another Ricky or Relay run is already active in this workspace”**

The marker that blocks a new run is a safety file created by the **Ricky** runtime to prevent overlapping executions. When the file is left behind (e.g., a crash, an interrupted CI job, or a stray background process), the next invocation sees it and aborts with the message you’re seeing.

Below is a deterministic, step‑by‑step repair workflow that you can run locally or in CI. Each step is a **verification gate** – if the check fails, stop and report the blocker before proceeding to the next gate.

---

## 1️⃣ Verify the workspace state

| Gate | Command | Expected outcome |
|------|---------|------------------|
| **1.1** Check for a running Ricky process | `ps aux | grep -i ricky | grep -v grep` | No active `ricky` or `relay` processes. |
| **1.2** Confirm you are in the correct repo root | `git rev-parse --show-toplevel` | Prints the absolute path of the repo root. |
| **1.3** Ensure you have a clean working tree | `git status --porcelain` | Output is empty (no modified/untracked files). |

*If any gate fails, stop and resolve the blocker (kill the stray process, cd to the repo root, or commit/stash chan
...[truncated]

Must

Report the active marker, run id, or status path when available.
Ask the user to inspect, wait for, or explicitly clear the active run.
Avoid launching a competing run that could corrupt evidence.

Must Not

Silently start another run.
Hide the existing run marker from the user.
Treat the conflict as a generic failure with no recovery path.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.auto-fix-bounded-loop (runtime-recovery/manual)

User message

Run a local workflow with auto-fix enabled. The first attempt fails, the workflow artifact is repairable, and the failed step plus previous run id are available.

Ricky output

**Local Workflow Execution – Auto‑Fix Mode**

Below is a deterministic, contract‑driven procedure you can run in a terminal (bash, zsh, PowerShell 7+, or any POSIX‑compatible shell). It assumes you have the **AgentWorkforce CLI** (`aw`) installed and that the failing workflow artifact is stored in the repository under `workflows/`.

---

## 1️⃣ Preconditions (Verification Gate)

| Check | Command | Expected outcome |
|-------|---------|------------------|
| **CLI installed** | `aw --version` | `aw version x.y.z` |
| **Repository root** | `git rev-parse --show-toplevel` | Path to repo root |
| **Workflow file exists** | `test -f workflows/<workflow‑name>.yml && echo ok` | `ok` |
| **Previous run ID available** | `echo $PREV_RUN_ID` | non‑empty string |
| **Failed step name** | `echo $FAILED_STEP` | non‑empty string |

If any gate fails, abort and report the blocker (e.g., “CLI not installed – run `brew install aw`”).

---

## 2️⃣ Contract – Input Parameters

| Parameter | Source | Type | Description |
|-----------|--------|------|-------------|
| `WORKFLOW` | User / script arg | string | Name of the workflow file (without `.yml`). |
| `FAILED_STEP` | Env var or CLI flag | string | E
...[truncated]

Must

Use a bounded retry budget and summarize every attempt.
Ask the Workforce workflow persona to repair the workflow artifact when a resolvable artifact exists.
Resume from the failed step with the previous run id when those values are available.

Must Not

Edit arbitrary repository source files as the default auto-fix surface.
Keep retrying after the configured max attempts.
Lose the single Ricky tracking run id across repair/resume attempts.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.no-auto-fix-preserves-single-attempt (runtime-recovery/manual)

User message

A user runs ricky run workflows/foo.ts --no-auto-fix and the workflow fails.

Ricky output

**Quick‑Start Debug Checklist**

| Step | Command / Action | What to Look For |
|------|------------------|-----------------|
| 1️⃣  | `ricky logs workflows/foo.ts` | Full stdout/stderr of the last run. Identify the exact error message and the stage where the failure occurred. |
| 2️⃣  | `ricky run workflows/foo.ts --debug` | Runs the workflow with verbose tracing (step entry/exit, input‑output payloads, contract validation results). |
| 3️⃣  | Open the workflow file | `src/workflows/foo.ts` – verify the **contract block** at the top of the file (input, output, pre‑conditions, post‑conditions). |
| 4️⃣  | Validate contracts locally | `ricky contract validate src/workflows/foo.ts` – ensures the TypeScript types and JSON‑schema contracts are syntactically correct. |
| 5️⃣  | Run the failing step in isolation | `ricky step run <step-id> --workflow workflows/foo.ts --no-auto-fix` – isolates the problematic node. |
| 6️⃣  | Check external dependencies | If the step calls a service, run the service‑client test: `ricky test service <service‑name>` and confirm connectivity/credentials. |
| 7️⃣  | Review deterministic verification gates | Look for `verify:` blocks in the workflow. Ensure th
...[truncated]

Must

Preserve one-attempt behavior when auto-fix is disabled.
Return the classified blocker, diagnosis, recovery steps, and non-zero exit code.
Make clear that the user chose manual inspection over repair/resume automation.

Must Not

Start a repair loop despite --no-auto-fix.
Suppress the diagnosis because no repair was attempted.
Present the failure as a completed repair attempt.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.in-process-local-runner (runtime-recovery/manual)

User message

Explain how Ricky should execute a local TypeScript workflow artifact in the primary local path.

Ricky output

**Ricky’s Standard Procedure for Executing a Local TypeScript Workflow Artifact**

| Phase | Action | Details | Expected Outcome |
|-------|--------|---------|-------------------|
| **1️⃣ Locate & Validate Artifact** | `ricky locate <workflow‑name>` | - Resolves the *primary local path* (default: `./workflows/<workflow‑name>/`). <br>- Confirms the presence of `workflow.ts` and a sibling `workflow.contract.json`. | Path resolved and contract file found. |
| **2️⃣ Verify Contract Preconditions** | `ricky verify‑contract <path>/workflow.contract.json` | - Parses the JSON contract. <br>- Checks required fields: `inputs`, `outputs`, `version`, `gateChecks`. <br>- Fails fast if any field is missing or malformed. | Contract is syntactically valid and all required keys exist. |
| **3️⃣ Install Dependencies** | `npm ci` (run in the workflow directory) | - Uses the exact lockfile (`package-lock.json` or `pnpm-lock.yaml`). <br>- Guarantees reproducible node_modules. | Deterministic dependency tree ready for execution. |
| **4️⃣ Compile‑or‑Run** | `npx ts-node --transpile-only workflow.ts` | - `ts-node` runs the TypeScript source without emitting JS files (fast, safe for local dev). <br>- `--t
...[truncated]

Must

Prefer the Node strip-types route or equivalent SDK/programmatic route over requiring the agent-relay binary on PATH.
Precheck that Node and @agent-relay/sdk are resolvable for the workflow.
Record the actual spawn command in execution evidence.

Must Not

Fail solely because agent-relay is not on PATH when the SDK route is available.
Hide the actual runtime command from evidence.
Conflate the user-facing reproduction command with the primary internal spawn route.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.escalation-is-not-generic-failure (runtime-recovery/manual)

User message

Ricky reaches a boundary after a structural failure persists after a fix attempt.

Ricky output

**Boundary‑Escalation Protocol – Persistent Structural Failure**

When a structural failure remains after an attempted fix, Ricky must treat the situation as a **hard blocker** and follow the deterministic escalation workflow defined in *AGENTS.md* and the **Workflow Reliability Standards (WRS‑01)**.

Below is the concrete contract you should enact, the verification gates you must satisfy, and the artifacts you need to produce.

---

## 1. Immediate Blocker Declaration  

| Item | Requirement |
|------|-------------|
| **Blocker Tag** | `BLOCKER:STRUCTURAL_FAILURE` |
| **Scope** | All downstream pipelines that depend on the affected component (e.g., `core‑schema`, `data‑pipeline‑v2`). |
| **Visibility** | Post to the **#workflow‑alerts** channel and add the tag to the component’s GitHub issue tracker. |
| **Deadline** | Within **5 minutes** of confirming the failure persists. |

*Action*: Run the command  

```bash
ricky blocker declare \
  --type STRUCTURAL_FAILURE \
  --component <component‑name> \
  --reason "Fix attempt <commit‑sha> did not resolve validation error X" \
  --notify "#workflow-alerts"

2. Create a Scoped Repair Branch

Branch Naming – `repair
...[truncated]


**Must**

- Escalate with the attempted fix, failed validation, classified blocker, and recommendation.
- Distinguish escalation from a generic product failure.
- Preserve enough context for a human operator to continue.

**Must Not**

- Retry speculative fixes indefinitely.
- Collapse to "something went wrong" without the attempted actions.
- Discard evidence from failed repair attempts.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>runtime-recovery.analytics-from-structured-evidence</code> (runtime-recovery/manual)</summary>

**User message**

> Produce a workflow health digest from many Ricky workflow runs.

**Ricky output**

```text
**Ricky Workflow – Health‑Digest Generator**  
*Purpose:* Aggregate the latest run metadata from a set of Ricky workflows, compute health metrics, and emit a concise digest (Markdown + JSON) for stakeholders.

---

## 1. Contract Overview  

| Item | Description |
|------|-------------|
| **Input** | `workflow_runs/` – a directory (or S3 bucket) containing one JSON file per workflow run. Each file follows the **Ricky Run Record** schema (see §2). |
| **Output** | `health_digest/` containing:<br>• `digest.md` – human‑readable summary<br>• `digest.json` – machine‑readable metrics |
| **Trigger** | Manual or scheduled (e.g., nightly) via CI/CD pipeline. |
| **Success Criteria** | - All input files are parsable.<br>- At least 95 % of runs have a *completed* status.<br>- No run exceeds its SLA by > 10 % (configurable).<br>- Digest files are written and checksum‑verified. |
| **Failure Mode** | Any validation gate failure aborts the run and creates a **Blocker Report** (see §5). |

---

## 2. Expected Run Record Schema  

```json
{
  "workflow_id": "string",          // e.g. "ricky/etl/transform"
  "run_id": "string",               // UUID
  "started_at": "ISO8601",          // UTC
  "en
...[truncated]

Must

Consume normalized WorkflowRunEvidence rather than raw logs as the primary input.
Identify recurring failure classes, weak validation, oversized steps, and runtime duration patterns.
Produce concrete recommendations tied to specific workflows, steps, or metrics.

Must Not

Mutate evidence while analyzing it.
Return generic advice like "improve your workflow" without references.
Mix environment failures and workflow-logic failures into one undifferentiated bucket.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW surfaces-ingress.co-equal-surfaces (surfaces-ingress/manual)

User message

Design a new Ricky Slack surface for workflow debugging.

Ricky output

## Ricky Slack Surface – “Workflow Debugger”

A dedicated Slack app (Ricky Debug) that lets engineers **inspect, step‑through, and repair** any Ricky‑managed workflow directly from a channel or DM. The surface is built as a **modal‑driven UI** with deterministic contracts and verification gates that keep the workflow state immutable until a successful commit.

---

### 1. High‑Level Interaction Flow

| Step | Slack UI Element | Action | Result |
|------|------------------|--------|--------|
| 1 | `/ricky debug <workflow‑id>` (slash command) | Launches a **Debug Launcher** modal. | Shows workflow metadata, latest run ID, and a **“Start Debug Session”** button. |
| 2 | **Debug Launcher** modal | Click **Start Debug Session**. | Creates a **debug session object** in the `ricky-debug` DB (immutable snapshot of the selected run). Returns a **session token** (short‑lived JWT). |
| 3 | **Session Overview** modal | Shows a **step list** (ordered by execution index) with status icons (✅, ⚠️, ❌). Each step has **“Inspect”, “Edit”, “Re‑run”** actions. | Engineers can drill into any step. |
| 4 | **Inspect Step** modal | Displays: <br>• Input payload (JSON) <br>• Output payload (JSON) <br>• Ex
...[truncated]

Must

Treat Slack as a co-equal product surface, not a wrapper around CLI.
Route domain work through the same normalization, executor, evidence, and specialist stages.
Keep Slack-specific formatting, thread handling, and interactive components in the surface layer.

Must Not

Put workflow generation or diagnosis domain logic directly in the Slack handler.
Degrade Slack to a developer shortcut with weaker routing than CLI.
Skip signature verification, dedup, or thread handling when the surface is implemented.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW surfaces-ingress.normalizer-is-only-translation-boundary (surfaces-ingress/manual)

User message

Add a new web handoff type that submits a workflow spec and mode.

Ricky output

**Feature:** Add a new *Web Handoff* type – **`WebSubmitSpec`** – that allows a client to POST a workflow specification together with an execution mode (e.g., `dry‑run`, `live`).  

**Location in repo:** `src/handoffs/web/`  

---  

## 1. Contract (API & Data Model)

| Element | Type | Description | Validation |
|---------|------|-------------|------------|
| `POST /hand-off/web-submit-spec` | HTTP endpoint | Accepts a JSON payload containing a workflow spec and a mode. | 200 on success, 400 on validation error, 415 on unsupported media type. |
| Request body | `WebSubmitSpecRequest` | ```json { "spec": { … }, "mode": "dry-run" \| "live" } ``` | • `spec` must be a valid **WorkflowSpec** (see `src/specs/WorkflowSpec.ts`). <br>• `mode` must be one of the allowed enum values. |
| Response body | `WebSubmitSpecResponse` | ```json { "workflowId": "<uuid>", "status": "queued" } ``` | `workflowId` is a UUID v4 generated by the orchestrator. |
| Errors | `ErrorResponse` | ```json { "code": "INVALID_SPEC", "message": "…"} ``` | Standard error envelope used across all handoffs. |

**Schema (TypeScript)** – `src/handoffs/web/types.ts`

```ts
export type ExecutionMode = 'dry-run' | 'live';

e
...[truncated]

Must

Add a handoff variant that normalizes into LocalInvocationRequest or CloudGenerateRequest.
Keep auth/session validation in the web surface before normalization.
Ensure downstream code does not need to know the request came from web.

Must Not

Short-circuit directly from the web handler to executors.
Create a parallel domain model for web requests.
Make the web surface the owner of local or Cloud routing semantics.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW surfaces-ingress.mcp-claude-context-is-metadata (surfaces-ingress/manual)

User message

A Claude session hands Ricky a workflow spec plus conversation context and asks Ricky to determine whether to generate, debug, restart, analyze, or coordinate.

Ricky output

**Decision:** Generate a new workflow (or a revised version) and create the associated coordination artifacts.

**Rationale**

| Factor | Observation | Impact on decision |
|--------|-------------|--------------------|
| **Input type** | You supplied a *workflow specification* together with *conversation context* (e.g., requirements, constraints, prior attempts). | Indicates a design‑time request rather than a runtime error or a need for debugging. |
| **Goal** | The request is to *determine whether to generate, debug, restart, analyze, or coordinate*. | The presence of a spec suggests the next logical step is to **author** (generate) the workflow, then set up the verification and coordination scaffolding. |
| **State of the workflow** | No mention of an existing failing run, missing artifacts, or a need for post‑mortem analysis. | No debugging or restart is required. |
| **Coordination need** | The spec is being handed over from Claude, implying a hand‑off that should be formalized (branch, PR, review). | Coordination steps will be added after generation. |
| **Deterministic verification** | The spec can be turned into a concrete contract (inputs, outputs, gates). | Enables the re
...[truncated]

Must

Normalize the tool call through the MCP/Claude handoff path.
Treat conversation context as advisory metadata.
Make the spec itself sufficient for routing whenever possible.

Must Not

Require the user to rewrite the spec manually as a workflow.
Put domain routing logic inside the MCP tool handler itself.
Ignore provided mode, workflow path, or prior decisions metadata.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW surfaces-ingress.cloud-api-versioning (surfaces-ingress/manual)

User message

Change the Cloud Ricky generate API response to include a new field and change one existing error code name.

Ricky output

**Scope**  
- Add a new field **`requestId`** (string, UUID) to the successful JSON payload returned by **`POST /v1/generate`**.  
- Rename error code **`INVALID_PROMPT`** to **`PROMPT_VALIDATION_ERROR`** in the error‑response schema and all references.  

All changes are limited to the **cloud‑ricky** service repository ( `services/cloud-ricky/` ). No other services are affected.

---

## 1️⃣ Contract – API Specification Update  

| Endpoint | Method | Success 200 Schema (add) | Error Schema (rename) |
|----------|--------|--------------------------|-----------------------|
| `/v1/generate` | POST | ```json { "output": "string", "requestId": "uuid‑v4" }``` | ```json { "code": "PROMPT_VALIDATION_ERROR", "message": "string" }``` |

*The `requestId` must be a RFC‑4122 UUID v4 generated at request entry and echoed back in the response.*  

**Version bump** – Increment the OpenAPI file `openapi.yaml` minor version (`v1.2.0 → v1.3.0`).  

All downstream SDKs will be regenerated from the updated spec.

---

## 2️⃣ Implementation Tasks  

| Task | File(s) | Description |
|------|---------|-------------|
| **2.1** Generate requestId | `src/handler.ts` | `const requestId = uuidv4();` store 
...[truncated]

Must

Allow additive response fields within /api/v1/ricky/....
Reject renaming or changing the meaning of an existing error code inside the same version.
Require a ne

Comment truncated to stay within GitHub limits. Download the ricky-eval-run artifact for the full human-review.md.

devin-ai-integration Bot reviewed May 10, 2026

View reviewed changes

kjgbot merged commit 65b53b6 into main May 10, 2026
3 checks passed

kjgbot deleted the docs/source-text-analysis-rule branch May 10, 2026 08:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(agents): add source-text-analysis rule (use parsers, not regex)#89

docs(agents): add source-text-analysis rule (use parsers, not regex)#89
kjgbot merged 1 commit intomainfrom
docs/source-text-analysis-rule

kjgbot commented May 10, 2026

Uh oh!

coderabbitai Bot commented May 10, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

devin-ai-integration Bot May 10, 2026

Uh oh!

Uh oh!

github-actions Bot commented May 10, 2026

2. Create a Scoped Repair Branch

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kjgbot commented May 10, 2026

Summary

Why now

What changed

Scope

Test plan

Uh oh!

coderabbitai Bot commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot May 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented May 10, 2026

Ricky Eval Review

Human Review Cases

2. Create a Scoped Repair Branch

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented May 10, 2026 •

edited

Loading