Skip to content

docs(agents): add source-text-analysis rule (use parsers, not regex)#89

Merged
kjgbot merged 1 commit intomainfrom
docs/source-text-analysis-rule
May 10, 2026
Merged

docs(agents): add source-text-analysis rule (use parsers, not regex)#89
kjgbot merged 1 commit intomainfrom
docs/source-text-analysis-rule

Conversation

@kjgbot
Copy link
Copy Markdown
Contributor

@kjgbot kjgbot commented May 10, 2026

Summary

Codify the architectural rule that #86 and #88 both arrived at independently: when Ricky source code needs to inspect text whose grammar Ricky already understands (TS/JS, Markdown, JSON, shell), use a parser for that grammar — not substring match, not regex, not an LLM.

Why now

Two PRs landed in the same week with the same shape of bug:

#86 #88
Site auto-fix-loop.ts:hasRickyWorkflowAliasImport markdown-target-files.ts:extractTargetFilesFromMarkdown
Bug substring from 'node:fs' matched HEREDOC string-literal content regex PATH_PATTERN matched paths inside fenced code blocks + prose noise
Fix ts.createSourceFile + walk ImportDeclaration nodes mdast-util-from-markdown + walk inlineCode nodes

Both PRs also independently rejected LLM-based detection with the same reasoning (non-determinism, eval-suite breakage, hot-path latency, prompt-injection surface for paths that later drive shell commands). Worth capturing as a durable rule so the next person tempted to grep-and-substitute on TS source has the prior art.

What changed

  • AGENTS.md — new top-level # Ricky Source Code Conventions section above the existing # Ricky Workflow Conventions. Holds one rule today (## Source-Text Analysis: Use Grammar-Aware Parsers, Not Regex) with reasoning, the rejected LLM alternative, available tooling (typescript, mdast-util-from-markdown), and a fallback ordering for when AST is genuinely too heavy.
  • CLAUDE.md — auto-follows via the existing symlink, consistent with AGENTS.md:206 ("CLAUDE.md should remain a symlink to AGENTS.md. Update AGENTS.md as the shared source of truth"). No separate Claude rule needed.

Scope

Documentation-only. No code, dep, or test changes. The two referenced PRs (#86, #88) ship the actual code fixes; this PR captures the pattern those fixes share so we stop relearning it.

Test plan

  • readlink CLAUDE.mdAGENTS.md (symlink intact)
  • head -180 CLAUDE.md | tail -10 shows the new section through the symlink

🤖 Generated with Claude Code

Two recent fixes converged on the same architectural answer for text
inspection on Ricky source-code hot paths:

- #86 (`auto-fix-loop.ts:hasRickyWorkflowAliasImport`) — substring
  match for `from 'node:fs'` was fooled by the literal text inside a
  HEREDOC embedded in a `.step({ command: ... })` body. Fixed by
  walking `ts.createSourceFile` ImportDeclaration nodes.
- #88 (`spec-intake/markdown-target-files.ts:extractTargetFilesFromMarkdown`)
  — regex `PATH_PATTERN` matched paths inside fenced code blocks and
  prose noise. Fixed by walking `mdast-util-from-markdown` inlineCode
  nodes.

Both fixes also independently rejected LLM-based detection for the
same reasons: non-determinism breaks the offline eval suite, latency
multiplies on every retry, and model output is a prompt-injection
surface for paths that drive shell commands.

Capture this as a durable rule in AGENTS.md so the next person
tempted to grep-and-substitute on TS source has the prior art. Adds
a new top-level "Ricky Source Code Conventions" section above the
existing workflow-authoring conventions, since it applies to Ricky's
own source rather than to the workflows Ricky generates. CLAUDE.md
auto-follows via the existing symlink — no separate Claude rule
needed (consistent with AGENTS.md:206).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 10, 2026

Review Change Stack
No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: f1939a23-536b-4d29-96f8-bd14a7ac927b

📥 Commits

Reviewing files that changed from the base of the PR and between 6a87282 and 146f6e1.

📒 Files selected for processing (1)
  • AGENTS.md

📝 Walkthrough

Walkthrough

This PR adds a "Ricky Source Code Conventions" section to AGENTS.md establishing rules for how Ricky inspects and modifies its own source code. The section prioritizes grammar-aware parsers (TypeScript AST, mdast) over regex and substring heuristics, defines a strict fallback strategy, and documents motivating bug patterns.

Changes

Ricky Source Code Conventions

Layer / File(s) Summary
Documentation: Source Code Analysis Conventions
AGENTS.md
New section defining parser-first inspection strategy for Ricky's own src/ analysis: AST walk → tokenization → regex on inert regions → no substring matching. Includes concrete examples of bugs that motivated the approach.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Poem

A scribe must read with careful eyes,
Not substring tricks or LLM guise—
But parse the trees where meanings hide,
Let grammar walk, not guess, decide. 🐰✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title directly and concisely describes the main change: documenting a source-text-analysis rule advocating for parsers over regex.
Description check ✅ Passed The description is well-structured and comprehensively explains the motivation, changes, scope, and testing for the PR.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch docs/source-text-analysis-rule

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View 2 additional findings in Devin Review.

Open in Devin Review

Comment thread AGENTS.md
Comment on lines +195 to +198
**Concrete tools available.** Both are already in `dependencies`:

- TypeScript / JavaScript: `import ts from 'typescript'`; `ts.createSourceFile(name, content, ts.ScriptTarget.Latest, false, ts.ScriptKind.TS)` then walk `sourceFile.statements`.
- Markdown: `import { fromMarkdown } from 'mdast-util-from-markdown'`; walk the resulting mdast tree (`inlineCode`, `code`, `heading`, `list`, etc.).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 AGENTS.md falsely claims mdast-util-from-markdown is already in dependencies

Line 195-198 states "Concrete tools available. Both are already in dependencies" and then lists mdast-util-from-markdown as an available markdown parser. However, mdast-util-from-markdown is not present in package.json (neither in dependencies nor devDependencies) and is not installed in node_modules. Only typescript is actually available. Since AGENTS.md is a mandatory rule file that agents follow, this false claim will cause agents to write code importing fromMarkdown from mdast-util-from-markdown expecting it to resolve, leading to build/import failures.

Suggested change
**Concrete tools available.** Both are already in `dependencies`:
- TypeScript / JavaScript: `import ts from 'typescript'`; `ts.createSourceFile(name, content, ts.ScriptTarget.Latest, false, ts.ScriptKind.TS)` then walk `sourceFile.statements`.
- Markdown: `import { fromMarkdown } from 'mdast-util-from-markdown'`; walk the resulting mdast tree (`inlineCode`, `code`, `heading`, `list`, etc.).
**Concrete tools available.** TypeScript is already in `dependencies`; add `mdast-util-from-markdown` when a call site needs it:
- TypeScript / JavaScript: `import ts from 'typescript'`; `ts.createSourceFile(name, content, ts.ScriptTarget.Latest, false, ts.ScriptKind.TS)` then walk `sourceFile.statements`.
- Markdown: `import { fromMarkdown } from 'mdast-util-from-markdown'`; walk the resulting mdast tree (`inlineCode`, `code`, `heading`, `list`, etc.). Install with `npm install mdast-util-from-markdown` before first use.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@kjgbot kjgbot merged commit 65b53b6 into main May 10, 2026
3 checks passed
@kjgbot kjgbot deleted the docs/source-text-analysis-rule branch May 10, 2026 08:20
@github-actions
Copy link
Copy Markdown
Contributor

Ricky Eval Review

Run: .ricky/evals/runs/2026-05-10T08-09-05-077Z-HEAD-provider
Mode: provider
Git SHA: 931fb46

Passed: 5 | Needs human: 42 | Reviewable: 42 | Missing output: 0 | Failed: 0 | Skipped: 0

Human Review Cases

These cases passed deterministic checks and include captured Ricky output for a human verdict against their Must / Must Not rubric.

REVIEW agent-assistant-boundary.real-reuse-not-rhetorical (agent-assistant-boundary/manual)

User message

Update Ricky docs and code to say it uses Agent Assistant more deeply.

Ricky output

## Updated Ricky Documentation – “Deeper Integration with Agent Assistant”

### 1. Overview
Ricky (AgentWorkforce workflow reliability, coordination, and authoring assistant) now **leverages the Agent Assistant (AA) platform as a first‑class service** for:

| Feature | How AA is used | Benefit |
|---------|----------------|---------|
| **Deterministic verification gates** | AA runs *contract‑check* and *property‑based* tests in a sandboxed executor before any PR merge. | Guarantees that every workflow contract is satisfied before code lands. |
| **Dynamic review artifacts** | AA generates **review‑summary** markdown, diff‑highlights, and **risk‑heatmaps** automatically. | Reduces reviewer cognitive load and surfaces hidden blockers early. |
| **Scalable validation loops** | AA orchestrates **80‑to‑100 validation loops** per workflow change (unit, integration, performance, security). | Provides statistical confidence that changes won’t regress reliability. |
| **Blocker detection & reporting** | AA monitors execution logs, surface‑level metrics, and SLA drift; it raises **honest blocker tickets** in the issue tracker. | Early detection of “silent failures” that would otherwise slip 
...[truncated]

Must

  • Ground claims in real package imports and runtime paths.
  • Distinguish current implementation from target architecture.
  • Identify which Agent Assistant primitive is actually exercised.

Must Not

  • Rename local code to sound Agent Assistant aligned and count that as adoption.
  • Claim broad Agent Assistant native behavior from documentation-only alignment.
  • Blur target architecture with landed behavior.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.turn-context-preserves-ricky-envelope (agent-assistant-boundary/manual)

User message

Evaluate the current Ricky @agent-assistant/turn-context adoption.

Ricky output

**Ricky @agent‑assistant/turn‑context Adoption Review (as of 2026‑05‑10)**  

| Metric | Current State | Target / Recommendation |
|--------|---------------|--------------------------|
| **Repository Coverage** | 68 % of active agents import `@agent-assistant/turn-context` (≈ 42 / 62 agents). | Push to **≥ 85 %** by Q3 2026. Prioritize high‑traffic agents (e.g., `@agent‑assistant/issue‑triage`, `@agent‑assistant/code‑review`). |
| **Version Consistency** | 3 distinct minor versions in use (v1.2.0, v1.3.1, v2.0.0). 27 % of agents still on v1.2.0. | Consolidate to **v2.0.0** (stable) within 2 sprints. Add a `peerDependency` bump in `package.json` and run `npm audit` to catch mismatches. |
| **Contract Compliance** | 81 % of agents with the context contract (`TurnContextContract`) pass the deterministic verification gate (`npm run verify:turn-context`). 19 % fail due to missing `metadata.sessionId` or mismatched `payload` schema. | Implement a **pre‑merge lint rule** (`eslint-plugin-turn-context`) to enforce contract fields. Schedule a remediation sprint for failing agents. |
| **Review Artifacts** | 54 % of PRs that modify turn‑context usage include a **Turn‑Context Review Checklist*
...[truncated]

Must

  • Preserve request id, source metadata, structured spec, invocation root, mode, stage mode, spec path, metadata, and spec text.
  • Record compact provenance through generation decisions or coordinator metadata.
  • Keep the shared turn context internal to the adapter boundary.

Must Not

  • Move LocalResponse, blocker taxonomy, recovery wording, or execution semantics into the shared turn-context package.
  • Drop Ricky-specific workflow metadata during envelope assembly.
  • Treat turn context as a product decision engine.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.product-core-stays-ricky-owned (agent-assistant-boundary/manual)

User message

Decide whether workflow generation, validation, debugging, staged CLI UX, and blocker/evidence wording should move into Agent Assistant.

Ricky output

**Recommendation: Move the listed capabilities into the Agent Assistant**

| Capability | Fit with Agent Assistant | Rationale | Migration Path |
|------------|--------------------------|-----------|----------------|
| **Workflow generation** | ✅ Core | The Assistant already owns the *authoring* contract (template → concrete workflow). Adding generation lets it produce the initial YAML/JSON skeleton directly from user intent, reducing hand‑offs. | 1. Define a **generation contract** (`genRequest → genSpec`).<br>2. Implement a deterministic template engine (Jinja2/Handlebars) with unit‑tested snippets.<br>3. Add a verification gate that the generated spec passes the *schema validator* before it is persisted. |
| **Workflow validation** | ✅ Core | Validation is the first deterministic gate after generation. Embedding it in the Assistant guarantees that every workflow entering the repo has passed the same schema, lint, and static‑analysis checks. | 1. Reuse the existing `validateWorkflow` service as a **validation contract** (`workflowSpec → validationReport`).<br>2. Hook the contract into the generation pipeline so the Assistant returns a pass/fail flag plus a human‑readable report. 
...[truncated]

Must

  • Keep product-defining workflow generation, validation, debugging, local UX, and evidence wording Ricky-owned until proof says otherwise.
  • Reuse shared runtime primitives where they reduce duplication without weakening Ricky.
  • Make extraction follow typed, tested, live product proof.

Must Not

  • Generalize workflow-specific behavior prematurely.
  • Adopt moving shared seams merely for architectural purity.
  • Lose the precise local-first staged workflow UX.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.one-slice-at-a-time (agent-assistant-boundary/manual)

User message

Plan the next Agent Assistant adoption slice for Ricky.

Ricky output

**Agent Assistant Adoption Slice – “Ricky v2.0 Roll‑out”**  
*Target window: 12 weeks (May 15 – Aug 7 2024)*  

---

## 1. Slice Objective
Deliver a production‑ready, self‑service adoption package for the **Ricky** AgentWorkforce assistant that enables **10 pilot teams** to author, test, and deploy reliable workflows with deterministic verification gates and built‑in review artifacts.

Success is measured by:
| Metric | Target |
|--------|--------|
| **Adoption** | ≥ 10 distinct teams onboarded |
| **Reliability** | ≤ 2 % workflow failure rate in pilot (post‑gate) |
| **Cycle Time** | ≤ 48 h from authoring → production merge |
| **Feedback** | ≥ 80 % of pilot participants rate “ease of use” ≥ 4/5 |

---

## 2. Scope & Boundaries  

| In‑Scope | Out‑of‑Scope |
|----------|--------------|
| • Ricky v2.0 core engine (deterministic contracts, validation loops)  <br>• CLI & VS Code extension for authoring <br>• GitHub‑based PR workflow (branch, PR, review, merge) <br>• Automated gate suite (type‑check, schema validation, 80‑100 loop tests) <br>• Pilot onboarding docs, training videos, and support channel | • Enterprise‑wide rollout (beyond pilot) <br>• Custom integrations not covered by
...[truncated]

Must

  • Pick exactly one real shared seam to evaluate or adopt.
  • Define a live Ricky product path that will prove the adoption.
  • Include regression checks that product messaging, blocker output, and evidence remain truthful.

Must Not

  • Bundle sessions, memory, policy, proactive behavior, and execution extraction into one vague migration.
  • Skip the comparison/evaluation step for mature Ricky-local seams.
  • Treat adoption as successful without a live product-path proof.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.future-surfaces-use-shared-runtime (agent-assistant-boundary/manual)

User message

Design future Slack or web support for Ricky using Agent Assistant packages.

Ricky output

## Ricky × Slack / Web Support – Agent‑Assistant Package Design  

Below is a **complete, contract‑driven package** that can be dropped into the Ricky monorepo (`/packages/ricky-slack-web`) and used by the existing CI/CD pipeline.  It follows the **Ricky repository conventions** (AGENTS.md), the **workflow standards** (deterministic gates, 80‑100 validation loops), and the **shared authoring rules** (scoped branches, PR templates, blocker reporting).

---

### 1. High‑Level Architecture  

| Component | Responsibility | Runtime | Key Interfaces |
|-----------|----------------|---------|----------------|
| **Ricky‑Core** | Business logic, intent parsing, state machine | Node 20 (LTS) | `ricky-core` npm package |
| **SlackAdapter** | Translate Slack events ↔ Ricky intents, send messages, handle interactive components | Serverless (AWS Lambda) | Slack Events API, Interactivity API |
| **WebChatAdapter** | Web‑socket based UI, fallback UI for browsers | Node/Express + Socket.io | WebSocket, REST `/ricky/webhook` |
| **AuthGateway** | OAuth2 for Slack, JWT for web clients | Lambda | `GET /auth/slack`, `POST /auth/web` |
| **VerificationEngine** | Deterministic gate checks, contract vali
...[truncated]

Must

  • Prefer shared surfaces, webhook-runtime, sessions, and routing primitives for future non-CLI interaction where mature.
  • Keep local CLI behavior product-local unless shared adoption is proven harmless.
  • Explain which behavior is future/target architecture versus implemented today.

Must Not

  • Preemptively add memory, policy, or proactive packages without a real Ricky product requirement.
  • Let future surface abstractions distort the current CLI contract.
  • Duplicate a mature Agent Assistant capability locally without justification.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.generation-default-not-execution (cli-behavior/manual)

User message

A user runs ricky --mode local --spec "generate a workflow for package checks" without --run.

Ricky output

**What the command does**

Running  

```bash
ricky --mode local --spec "generate a workflow for package checks"
  • --mode local – tells Ricky to operate in the local development environment (no remote CI/CD resources are touched).
  • --spec "generate a workflow for package checks" – is the natural‑language request that Ricky will translate into a concrete workflow definition (YAML/JSON) following the AgentWorkforce contract.
  • Absence of --run – means Ricky will only generate the workflow artifact; it will not submit, schedule, or execute any checks.

So the command will:

  1. Parse the spec string.
  2. Resolve the appropriate workflow template (e.g., package-checks.yml).
  3. Fill in any required parameters (defaults are used unless you provide overrides).
  4. Emit a workflow contract file (by default ricky-workflow.yaml in the current directory).
  5. Print a short summary to stdout, e.g.:
✅ Workflow generated: ricky-workflow.yaml
   • 5 stages: lint, unit-test, security-scan, license‑audit, publish‑dry‑run
   • Deterministic verification gates inserted after each stage
   • Review artifact: ricky-workflow-review.md

No vali
...[truncated]


**Must**

- Say generation is the default and execution was not requested.
- Print the generated artifact path, workflow id, spec digest, and next run command.
- Avoid showing execution evidence for a generation-only request.

**Must Not**

- Imply the workflow ran automatically.
- Present a generation-only result as execution success.
- Hide the opt-in commands for running the artifact.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>cli.first-run-copy-is-compact-and-truthful</code> (cli-behavior/manual)</summary>

**User message**

> Render Ricky's first-run CLI onboarding for a new user.

**Ricky output**

```text
**Ricky – First‑Run CLI Onboarding Workflow**  
*(Designed for a brand‑new user on a clean workstation)*  

---  

### 1️⃣ Prerequisites Check (Verification Gate)

| Step | Command | Expected Result | Failure Action |
|------|---------|----------------|----------------|
| 1.1 | `python3 --version` | `Python 3.9+` | Abort – instruct to install Python 3.9+ |
| 1.2 | `git --version` | `git version X.Y.Z` | Abort – provide link to Git install |
| 1.3 | `pip3 --version` | `pip X.Y` | Abort – reinstall pip |
| 1.4 | `ricky --version` (if already installed) | `ricky X.Y.Z` | Skip install step, go to **2** |
| 1.5 | Network test: `curl -s https://api.ricky.ai/ping` | `pong` | Abort – ask user to verify internet / firewall |

*All checks are wrapped in the script `ricky onboard --verify`. If any gate fails, the script prints a **blocker report** and exits with status 1.*

---  

### 2️⃣ Install Ricky (Deterministic Installation)

```bash
# Create an isolated virtual environment
python3 -m venv ~/.ricky-env
source ~/.ricky-env/bin/activate

# Upgrade pip and install Ricky from the official index
pip install --upgrade pip
pip install ricky-cli==1.0.0   # pin to the latest stable release

...[truncated]


**Must**

- Show compact Ricky branding and clear Local / BYOH, Cloud, Both, and Just explore choices.
- End every branch with a concrete next step.
- Advertise only commands that are currently implemented.

**Must Not**

- Sound like a launch page or documentation dump.
- Claim Ricky runs workflows by default when generation is the default path.
- Require web or Slack onboarding before CLI use.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>cli.recovery-guidance-no-stack-traces</code> (cli-behavior/manual)</summary>

**User message**

> A user gives Ricky an empty spec or a missing spec file.

**Ricky output**

```text
**Ricky – Workflow Reliability & Coordination Assistant**

---

### 🚦 Current Situation
- **Input:** An empty specification or a missing spec file.
- **Impact:** Without a concrete spec we cannot generate a deterministic workflow contract, verification gates, or review artifacts. This blocks downstream validation loops and PR creation.

### 📋 What We Need
1. **Specification Scope** – High‑level goal (e.g., “process inbound support tickets”).
2. **Key Actors & Roles** – Who initiates, who consumes, any external services.
3. **Inputs / Outputs** – Data formats, required fields, success/failure signals.
4. **Non‑functional Requirements** – SLA, reliability targets, security constraints.
5. **Success Criteria** – Metrics that define “done” for each stage.

### 📄 Minimal Spec Template
> **Title:** *Descriptive workflow name*  
> **Owner:** *Team or individual*  
> **Goal:** *One‑sentence purpose*  
> **Actors:**  
> - *Actor A* – role description  
> - *Actor B* – role description  
> **Inputs:**  
> - *Input 1* – type, schema, source  
> - *Input 2* – …  
> **Outputs:**  
> - *Output 1* – type, schema, destination  
> - *Output 2* – …  
> **Steps (high‑level):**  
> 1. *Step 1* – br
...[truncated]

Must

  • Return a user-facing failure or guidance message with a real recovery command.
  • Distinguish generation failure from execution failure.
  • Show stack traces only when verbose diagnostic mode is requested.

Must Not

  • Crash with an uncaught exception in normal mode.
  • Suggest commands that do not exist.
  • Pretend a missing spec was accepted.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.status-does-not-invent-provider-state (cli-behavior/manual)

User message

Render ricky status when no provider checks have proven Google or GitHub are connected.

Ricky output

**`ricky status` output (no provider checks have proven Google or GitHub are connected)**  

=== AgentWorkforce Status Report ===

🔧 Core Services
• Scheduler : OK
• Workflow Engine : OK
• Persistence Layer : OK

🔗 External Providers
• Google : DISCONNECTED (no successful health‑check)
• GitHub : DISCONNECTED (no successful health‑check)

🛡️ Reliability
• Last verification gate : 2026‑05‑10 08:12 UTC
• Pending verification loops : 0
• Blockers : None (providers simply unverified)

📊 Metrics (last 5 min)
• Active workflows : 12
• Queued tasks : 3
• Errors (system) : 0

📝 Notes
– Provider connections are lazy‑checked; until a successful ping or token validation occurs, they remain in the DISCONNECTED state.
– Run ricky provider check google or ricky provider check github to trigger an immediate verification.
– Once a provider reports a successful health‑check, the status will automatically update to CONNECTED.

=== End of Report ===

Must

  • Report unknown or not-connected provider state honestly.
  • Update provider status only from explicit provider checks or Cloud status results.
  • Give concrete setup guidance for Cloud when relevant.

Must Not

  • Mark Google or GitHub connected because guidance text was shown.
  • Invent a provider connection URL or OAuth flow.
  • Show empty fields with no recovery guidance when config is missing.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.skill-matcher-registry-backed (generation-quality/manual)

User message

Generate a workflow for a GitHub primitive change and explain which skills Ricky selected.

Ricky output

**Workflow: “GitHub Primitive Change”**

| Step | Owner | Action | Input | Output | Verification Gate |
|------|-------|--------|-------|--------|--------------------|
| 1️⃣ | **Ricky (Coordination)** | **Create a scoped branch** | `main` (or target base) | New branch `feat/<short‑description>` | Branch exists, naming follows `feat/…` pattern |
| 2️⃣ | **Ricky (Branch‑Management)** | **Sync latest base** | `main` | Branch is up‑to‑date with `main` | `git merge-base --is-ancestor main feat/...` passes |
| 3️⃣ | **Developer (Implementation)** | **Apply primitive change** (e.g., edit a config file, add a comment, bump a version) | Branch checkout | Modified files committed | `git diff --quiet` shows expected file changes |
| 4️⃣ | **Ricky (Deterministic‑Verification)** | **Run local lint / unit test** | Commit SHA | Lint & test results | All lint rules pass; test suite returns 0 |
| 5️⃣ | **Ricky (CI‑Trigger)** | **Push branch to remote** | Local branch | Remote `origin/feat/...` | `git push` succeeds, remote branch visible |
| 6️⃣ | **Ricky (CI‑Orchestration)** | **Open Pull Request (PR)** | Remote branch, target `main` | PR #\<n\> with title & description template | PR created, requ
...[truncated]

Must

  • Select skills from the actual registry rather than a hardcoded static set.
  • Record ranked skill matches with confidence and trigger evidence.
  • Fall back gracefully when the registry is missing or no skill clears the threshold.

Must Not

  • Claim runtime agents embody skills when only generation-time guidance was applied.
  • Fail generation solely because optional skill files are missing.
  • Hide skill selection evidence from artifacts or JSON output.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.tool-selector-honors-spec-hints (generation-quality/manual)

User message

Generate a workflow where the spec says "use Claude to review and Codex to implement".

Ricky output

**Workflow: Claude‑Review → Codex‑Implement**

| Phase | Owner | Artifact | Contract / Acceptance Criteria | Verification Gate |
|-------|-------|----------|--------------------------------|--------------------|
| **1. Requirements Ingestion** | Product Owner | `spec.md` | • Clear functional description<br>• Acceptance criteria listed<br>• No ambiguous terms | Automated lint of `spec.md` (Markdown schema) – must pass 100 % |
| **2. Claude Review** | **Claude (LLM‑reviewer)** | `review-report.md` | • Detect missing edge cases, security concerns, performance gaps<br>• Provide concrete suggestions (≤ 5 items) with line‑referenced citations to `spec.md`<br>• Output must be in the prescribed JSON‑wrapped format (see below) | JSON schema validation + sanity check that at least one suggestion is produced |
| **3. Review Acceptance** | Human Gatekeeper (PM/Tech Lead) | `review‑approval.yml` | • All suggestions either **Accepted** (to be incorporated) or **Rejected** (with justification)<br>• No “pending” items remain | CI step `review‑gate` fails if any suggestion is left pending |
| **4. Codex Implementation** | **Codex (LLM‑coder)** | `src/` (codebase) + `impl‑log.md` | • Generate code t
...[truncated]

Must

  • Assign per-step runner/model decisions from explicit spec hints where possible.
  • Let skill metadata or project defaults fill gaps when the spec is silent.
  • Write tool-selection.json or equivalent audit metadata.

Must Not

  • Use one generic runtime default for every agent despite explicit hints.
  • Let skill defaults override explicit user/spec runner hints.
  • Omit the reason each tool or model was chosen.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.refine-is-opt-in-and-bounded (generation-quality/manual)

User message

Use --refine to sharpen a generated workflow's step task descriptions and acceptance gates.

Ricky output

## Using `--refine` to Sharpen Step Descriptions & Acceptance Gates  

`--refine` is the **deterministic post‑processing flag** that takes a freshly generated workflow (JSON/YAML) and runs a second, focused LLM pass that:

1. **Tightens the natural‑language task description** for each step (clear intent, bounded scope, measurable verbs).  
2. **Adds or sharpens acceptance‑gate predicates** (pre‑conditions, post‑conditions, invariants) so they are **machine‑verifiable** and **idempotent**.  
3. **Leaves the overall topology unchanged** – no new steps, no re‑ordering, only the *contract* of each node is refined.

Below is a concrete contract‑first workflow authoring pattern you can copy‑paste into your CI pipeline or local dev script.

---

### 1️⃣ Input Workflow (pre‑refine)

```yaml
name: data‑ingest‑pipeline
steps:
  - id: fetch_raw
    task: "Grab the latest CSV from the data lake."
    gate:
      type: post
      condition: "file exists"
  - id: validate_schema
    task: "Make sure the CSV matches our schema."
    gate:
      type: post
      condition: "no errors"
  - id: load_to_dw
    task: "Push the CSV into the warehouse."
    gate:
      type: post
      condition: "rows 
...[truncated]

Must

  • Keep the deterministic unrefined path as the default.
  • Bound the refinement pass by timeout, token budget, and editable regions.
  • Re-run validation after refinement and fall back to the deterministic artifact on unsafe edits or provider failure.

Must Not

  • Change the workflow graph, agent assignments, or side-effect scope during refinement.
  • Fail the whole generation if optional refinement times out.
  • Hide warnings when the deterministic artifact is returned unchanged.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.behavior-grounded-gates (generation-quality/manual)

User message

Generate a workflow for the ricky --version spec.

Ricky output

**Workflow: `ricky --version` Specification & Delivery**  
*(Ricky AgentWorkforce – v0.9.3)*  

---

## 1. Objective
Produce a deterministic, test‑driven implementation of the `ricky --version` CLI flag that:

1. Returns the current Ricky package version (semantic‑version string) on **stdout**.  
2. Exits with status **0** on success, **1** on any internal error.  
3. Is reproducible across all supported platforms (Linux, macOS, Windows).  
4. Is covered by unit‑tests, integration‑tests, and a CI verification gate.

---

## 2. Contract (Interface Definition)

| Element | Description |
|---------|-------------|
| **Command** | `ricky --version` |
| **Input** | No positional arguments; optional `--json` flag (see Extension). |
| **Output** | - **Plain mode**: `<major>.<minor>.<patch>` (e.g., `1.4.2`) followed by a newline.<br>- **JSON mode** (`--json`): `{"ricky_version":"1.4.2"}` newline. |
| **Exit Code** | `0` on success, `1` on failure (e.g., version file unreadable). |
| **Side‑effects** | None (pure read‑only). |
| **Error Message** | Printed to **stderr**: `ERROR: unable to read version` (if failure). |

*Version source*: `src/ricky/__init__.py` contains `__version__ = "X.Y.Z"
...[truncated]

Must

  • Build gates from the stated acceptance behavior, such as checking ricky --version output.
  • Avoid generic source-shape grep checks when the spec asks for CLI behavior.
  • Keep generated validation meaningful for the current repo shape.

Must Not

  • Treat grep -Eq 'export|function|class|workflow(' dist/ricky.js as proof of version behavior.
  • Claim the workflow is proven by source syntax alone.
  • Ignore the package-json version resolution order in the spec.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.pattern-selection-deliberate (generation-quality/manual)

User message

Generate a workflow for many independent artifacts with a validation/fix/rerun loop.

Ricky output

**Workflow: “Batch‑Artifact Validation‑Fix‑Rerun Loop”**  
*Purpose*: Process a large set of independent artifacts (e.g., data files, model checkpoints, config bundles) through a deterministic validation stage, automatically apply fix‑scripts where possible, and rerun validation until all artifacts are either **PASS** or **BLOCKED**. The workflow is scoped to a single branch (`batch‑validation‑loop`) and is intended to be merged via a single PR that contains the workflow definition and supporting scripts.

---

## 1. Repository Layout (branch `batch‑validation‑loop`)

/.github/workflows/
batch_artifact_loop.yml # workflow definition
/scripts/
validate_artifact.py # deterministic validator (returns JSON)
fix_artifact.py # idempotent fixer (optional)
report_summary.py # aggregates per‑artifact results
/tests/
test_validate_artifact.py # unit tests for validator
test_fix_artifact.py # unit tests for fixer
/artifacts/
*.json # input artifacts (any extension, treated as opaque blobs)


All scripts are **pure Python 3.11**, have **type‑hints**, and are covered by unit tests (≥ 90 % lin
...[truncated]

Must

  • Choose dag, supervisor, or pipeline deliberately based on the work shape.
  • Explain the pattern choice in artifact metadata or a rationale.
  • Use dag for validation/fix/rerun loops when dependencies matter.

Must Not

  • Default blindly to dag for every workflow.
  • Collapse independent artifact work into one vague agent task.
  • Omit verification gates because the chosen pattern seems obvious.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.skill-boundary-copy (generation-quality/manual)

User message

Describe how selected workflow-writing skills affected a generated Ricky workflow.

Ricky output

### How Selected Workflow‑Writing Skills Shaped the Generated Ricky Workflow  

| Skill | What the Skill Enforces | Direct Effect on the Generated Workflow |
|------|------------------------|------------------------------------------|
| **Concrete Contract Definition** | Every workflow must start with a **`contract`** block that lists required inputs, expected outputs, and versioned schema. | The workflow begins with a `contract { inputs: {...}, outputs: {...}, version: "1.0.0" }` stanza. This makes the interface machine‑readable, enables downstream validation, and prevents “silent‑parameter” bugs. |
| **Deterministic Verification Gates** | Insert explicit `gate` steps that run only when a predicate is true and must return a boolean. | After each major transformation (`transform`, `enrich`, `route`) a `gate verify_<step>()` is added. The gate runs a unit‑test‑style script (e.g., JSON‑schema validation) and aborts the run if it fails, guaranteeing that each stage’s contract is honored before proceeding. |
| **Review Artifacts Generation** | Every PR must produce a `review.md` artifact summarizing changes, rationale, and impact analysis. | The workflow includes a `post‑process genera
...[truncated]

Must

  • Describe skills as generation-time selection, loading, and rendering inputs.
  • Point to metadata such as loaded-skills.txt and skill-application-boundary.json.
  • Say the workflow instructions were informed by selected skills.

Must Not

  • Say runtime agents are skill-embedded, skill-powered at runtime, or embody skills unless runtime skill loading is implemented and tested.
  • Treat metadata existence as proof that agents received skill bodies at runtime.
  • Overstate current tests beyond the generation boundary.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.no-pure-codegen-without-proof (generation-quality/manual)

User message

Generate a workflow from a vague product spec and return it to the user.

Ricky output

Below is a concise “request‑for‑clarification” template you can send to the product owner. It follows our standard workflow‑authoring contract: it asks for the missing concrete details, defines the verification gates we’ll need, and sets the scope for the eventual PR.

---

## 📄 Request for Clarification – Vague Product Spec

**Subject:** Needed details to author the **[Feature Name]** workflow  

**From:** Ricky – Workflow Reliability & Coordination Assistant  
**To:** [Product Owner / Stakeholder]  

---

### 1️⃣ What we need to know

| Area | Specific question | Why it matters (verification gate) |
|------|-------------------|------------------------------------|
| **Goal & Success Metric** | What is the exact business outcome this workflow must achieve? (e.g., “process 10 k orders/hr with <2 % error”) | Defines the **acceptance test** and SLA gate. |
| **Actors & Roles** | Who are the human or system actors (e.g., “Order‑Ingest Service”, “Finance Reviewer”, “Customer”) and what permissions do they need? | Determines **RBAC** checks and hand‑off validation. |
| **Trigger(s)** | What event(s) start the workflow? (e.g., webhook, schedule, manual UI button) | Sets the **entry‑poin
...[truncated]

Must

  • Produce a Relay-native TypeScript workflow with explicit verification, review, and signoff.
  • Validate with dry-run or targeted structural checks where possible.
  • Return artifacts, warnings, and follow-up commands honestly.

Must Not

  • Act like Ricky is a pure code-generation bot that emits workflows without verification.
  • Stop at "code compiles" as the proof bar.
  • Skip skill-aware workflow authoring guidance for serious workflows.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.classify-before-retry (runtime-recovery/manual)

User message

A workflow failed after one step timed out and another worker stayed alive without producing artifacts. Explain what Ricky should do next.

Ricky output

**Ricky’s post‑failure protocol**

1. **Detect & Record the failure**  
   - The orchestrator’s timeout monitor flags *Step X* as timed‑out.  
   - The liveness watchdog notices *Worker Y* still running after the timeout window and without emitting any artifact (no files in the expected output bucket, no status update on the step‑state channel).

2. **Immediate containment actions**  
   - **Terminate** Worker Y via the runtime’s kill‑signal API (e.g., `kubectl delete pod <worker‑id>` or the cloud‑function abort call).  
   - **Mark** Step X as **FAILED** in the workflow state store, attaching the timeout error code and a “worker‑orphaned” flag.  
   - **Emit** a failure event on the workflow event bus (`workflow.failed`) with payload:
     ```json
     {
       "workflow_id": "...",
       "step_id": "X",
       "error": "timeout",
       "orphaned_worker_id": "Y",
       "timestamp": "...",
       "logs_url": "<log‑store‑path>"
     }
     ```

3. **Gather diagnostics (deterministic verification gate)**  
   - Pull the **runtime logs** for Worker Y (stdout, stderr, system metrics).  
   - Retrieve the **step‑state checkpoint** (if any) from the state store.  
   - Capture the **r
...[truncated]

Must

  • Classify the failure before deciding whether to retry.
  • Distinguish agent-runtime opacity, timeout, environment blockers, workflow-structure bugs, and validation-strategy mismatch.
  • Preserve raw evidence and uncertainty when the class is not known.

Must Not

  • Blindly rerun the whole workflow without a blocker classification.
  • Treat every failure as a broken workflow definition.
  • Claim Ricky fixed the workflow before rerun evidence proves it.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.stale-relay-state (runtime-recovery/manual)

User message

Ricky detects stale .agent-relay/, .relay/, and .trajectories/ state before launching a local workflow.

Ricky output

**Workflow Contract – Stale‑State Detection for Local Launches**  

| **Stage** | **Input** | **Output** | **Success Criteria** |
|-----------|-----------|------------|----------------------|
| **1. Pre‑launch Scan** | Path to project root (absolute or relative) | `scan‑report.json` | All three directories (`.agent‑relay/`, `.relay/`, `.trajectories/`) are enumerated and their timestamps recorded. |
| **2. Staleness Evaluation** | `scan‑report.json` + optional `max‑age‑hours` (default 24 h) | `stale‑flags.json` | For each directory: <br>• **Empty** → `empty:true` <br>• **Non‑empty** and **old** (oldest file > max‑age) → `stale:true` <br>• **Fresh** → `stale:false` |
| **3. Blocker Gate** | `stale‑flags.json` | `gate‑decision.json` | If any `stale:true` flag is present, the gate returns `blocked:true` with a human‑readable message; otherwise `blocked:false`. |
| **4. Reporter** | `gate‑decision.json` | Console / CI log entry | Prints a concise summary and, when blocked, a remediation hint (e.g., “run `ricky clean‑state`”). |
| **5. Launch Pass‑Through** | `gate‑decision.json` (only when `blocked:false`) | Proceeds to the normal workflow launch script | Guarantees that a clean state 
...[truncated]

Must

  • Classify stale local runtime state as an environment contamination issue.
  • Recommend quarantine or isolated-run guidance before launch.
  • Record the observed paths and the action taken or recommended.

Must Not

  • Treat stale runtime state as a workflow logic failure.
  • Delete or overwrite state without an explicit safe path or user intent.
  • Continue into execution as if the workspace were clean.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.already-running-conflict (runtime-recovery/manual)

User message

A run marker says another Ricky or Relay run is already active in this workspace.

Ricky output

**Fixing “Another Ricky or Relay run is already active in this workspace”**

The marker that blocks a new run is a safety file created by the **Ricky** runtime to prevent overlapping executions. When the file is left behind (e.g., a crash, an interrupted CI job, or a stray background process), the next invocation sees it and aborts with the message you’re seeing.

Below is a deterministic, step‑by‑step repair workflow that you can run locally or in CI. Each step is a **verification gate** – if the check fails, stop and report the blocker before proceeding to the next gate.

---

## 1️⃣ Verify the workspace state

| Gate | Command | Expected outcome |
|------|---------|------------------|
| **1.1** Check for a running Ricky process | `ps aux | grep -i ricky | grep -v grep` | No active `ricky` or `relay` processes. |
| **1.2** Confirm you are in the correct repo root | `git rev-parse --show-toplevel` | Prints the absolute path of the repo root. |
| **1.3** Ensure you have a clean working tree | `git status --porcelain` | Output is empty (no modified/untracked files). |

*If any gate fails, stop and resolve the blocker (kill the stray process, cd to the repo root, or commit/stash chan
...[truncated]

Must

  • Report the active marker, run id, or status path when available.
  • Ask the user to inspect, wait for, or explicitly clear the active run.
  • Avoid launching a competing run that could corrupt evidence.

Must Not

  • Silently start another run.
  • Hide the existing run marker from the user.
  • Treat the conflict as a generic failure with no recovery path.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.auto-fix-bounded-loop (runtime-recovery/manual)

User message

Run a local workflow with auto-fix enabled. The first attempt fails, the workflow artifact is repairable, and the failed step plus previous run id are available.

Ricky output

**Local Workflow Execution – Auto‑Fix Mode**

Below is a deterministic, contract‑driven procedure you can run in a terminal (bash, zsh, PowerShell 7+, or any POSIX‑compatible shell). It assumes you have the **AgentWorkforce CLI** (`aw`) installed and that the failing workflow artifact is stored in the repository under `workflows/`.

---

## 1️⃣ Preconditions (Verification Gate)

| Check | Command | Expected outcome |
|-------|---------|------------------|
| **CLI installed** | `aw --version` | `aw version x.y.z` |
| **Repository root** | `git rev-parse --show-toplevel` | Path to repo root |
| **Workflow file exists** | `test -f workflows/<workflow‑name>.yml && echo ok` | `ok` |
| **Previous run ID available** | `echo $PREV_RUN_ID` | non‑empty string |
| **Failed step name** | `echo $FAILED_STEP` | non‑empty string |

If any gate fails, abort and report the blocker (e.g., “CLI not installed – run `brew install aw`”).

---

## 2️⃣ Contract – Input Parameters

| Parameter | Source | Type | Description |
|-----------|--------|------|-------------|
| `WORKFLOW` | User / script arg | string | Name of the workflow file (without `.yml`). |
| `FAILED_STEP` | Env var or CLI flag | string | E
...[truncated]

Must

  • Use a bounded retry budget and summarize every attempt.
  • Ask the Workforce workflow persona to repair the workflow artifact when a resolvable artifact exists.
  • Resume from the failed step with the previous run id when those values are available.

Must Not

  • Edit arbitrary repository source files as the default auto-fix surface.
  • Keep retrying after the configured max attempts.
  • Lose the single Ricky tracking run id across repair/resume attempts.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.no-auto-fix-preserves-single-attempt (runtime-recovery/manual)

User message

A user runs ricky run workflows/foo.ts --no-auto-fix and the workflow fails.

Ricky output

**Quick‑Start Debug Checklist**

| Step | Command / Action | What to Look For |
|------|------------------|-----------------|
| 1️⃣  | `ricky logs workflows/foo.ts` | Full stdout/stderr of the last run. Identify the exact error message and the stage where the failure occurred. |
| 2️⃣  | `ricky run workflows/foo.ts --debug` | Runs the workflow with verbose tracing (step entry/exit, input‑output payloads, contract validation results). |
| 3️⃣  | Open the workflow file | `src/workflows/foo.ts` – verify the **contract block** at the top of the file (input, output, pre‑conditions, post‑conditions). |
| 4️⃣  | Validate contracts locally | `ricky contract validate src/workflows/foo.ts` – ensures the TypeScript types and JSON‑schema contracts are syntactically correct. |
| 5️⃣  | Run the failing step in isolation | `ricky step run <step-id> --workflow workflows/foo.ts --no-auto-fix` – isolates the problematic node. |
| 6️⃣  | Check external dependencies | If the step calls a service, run the service‑client test: `ricky test service <service‑name>` and confirm connectivity/credentials. |
| 7️⃣  | Review deterministic verification gates | Look for `verify:` blocks in the workflow. Ensure th
...[truncated]

Must

  • Preserve one-attempt behavior when auto-fix is disabled.
  • Return the classified blocker, diagnosis, recovery steps, and non-zero exit code.
  • Make clear that the user chose manual inspection over repair/resume automation.

Must Not

  • Start a repair loop despite --no-auto-fix.
  • Suppress the diagnosis because no repair was attempted.
  • Present the failure as a completed repair attempt.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.in-process-local-runner (runtime-recovery/manual)

User message

Explain how Ricky should execute a local TypeScript workflow artifact in the primary local path.

Ricky output

**Ricky’s Standard Procedure for Executing a Local TypeScript Workflow Artifact**

| Phase | Action | Details | Expected Outcome |
|-------|--------|---------|-------------------|
| **1️⃣ Locate & Validate Artifact** | `ricky locate <workflow‑name>` | - Resolves the *primary local path* (default: `./workflows/<workflow‑name>/`). <br>- Confirms the presence of `workflow.ts` and a sibling `workflow.contract.json`. | Path resolved and contract file found. |
| **2️⃣ Verify Contract Preconditions** | `ricky verify‑contract <path>/workflow.contract.json` | - Parses the JSON contract. <br>- Checks required fields: `inputs`, `outputs`, `version`, `gateChecks`. <br>- Fails fast if any field is missing or malformed. | Contract is syntactically valid and all required keys exist. |
| **3️⃣ Install Dependencies** | `npm ci` (run in the workflow directory) | - Uses the exact lockfile (`package-lock.json` or `pnpm-lock.yaml`). <br>- Guarantees reproducible node_modules. | Deterministic dependency tree ready for execution. |
| **4️⃣ Compile‑or‑Run** | `npx ts-node --transpile-only workflow.ts` | - `ts-node` runs the TypeScript source without emitting JS files (fast, safe for local dev). <br>- `--t
...[truncated]

Must

  • Prefer the Node strip-types route or equivalent SDK/programmatic route over requiring the agent-relay binary on PATH.
  • Precheck that Node and @agent-relay/sdk are resolvable for the workflow.
  • Record the actual spawn command in execution evidence.

Must Not

  • Fail solely because agent-relay is not on PATH when the SDK route is available.
  • Hide the actual runtime command from evidence.
  • Conflate the user-facing reproduction command with the primary internal spawn route.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.escalation-is-not-generic-failure (runtime-recovery/manual)

User message

Ricky reaches a boundary after a structural failure persists after a fix attempt.

Ricky output

**Boundary‑Escalation Protocol – Persistent Structural Failure**

When a structural failure remains after an attempted fix, Ricky must treat the situation as a **hard blocker** and follow the deterministic escalation workflow defined in *AGENTS.md* and the **Workflow Reliability Standards (WRS‑01)**.

Below is the concrete contract you should enact, the verification gates you must satisfy, and the artifacts you need to produce.

---

## 1. Immediate Blocker Declaration  

| Item | Requirement |
|------|-------------|
| **Blocker Tag** | `BLOCKER:STRUCTURAL_FAILURE` |
| **Scope** | All downstream pipelines that depend on the affected component (e.g., `core‑schema`, `data‑pipeline‑v2`). |
| **Visibility** | Post to the **#workflow‑alerts** channel and add the tag to the component’s GitHub issue tracker. |
| **Deadline** | Within **5 minutes** of confirming the failure persists. |

*Action*: Run the command  

```bash
ricky blocker declare \
  --type STRUCTURAL_FAILURE \
  --component <component‑name> \
  --reason "Fix attempt <commit‑sha> did not resolve validation error X" \
  --notify "#workflow-alerts"

2. Create a Scoped Repair Branch

  1. Branch Naming – `repair
    ...[truncated]

**Must**

- Escalate with the attempted fix, failed validation, classified blocker, and recommendation.
- Distinguish escalation from a generic product failure.
- Preserve enough context for a human operator to continue.

**Must Not**

- Retry speculative fixes indefinitely.
- Collapse to "something went wrong" without the attempted actions.
- Discard evidence from failed repair attempts.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>runtime-recovery.analytics-from-structured-evidence</code> (runtime-recovery/manual)</summary>

**User message**

> Produce a workflow health digest from many Ricky workflow runs.

**Ricky output**

```text
**Ricky Workflow – Health‑Digest Generator**  
*Purpose:* Aggregate the latest run metadata from a set of Ricky workflows, compute health metrics, and emit a concise digest (Markdown + JSON) for stakeholders.

---

## 1. Contract Overview  

| Item | Description |
|------|-------------|
| **Input** | `workflow_runs/` – a directory (or S3 bucket) containing one JSON file per workflow run. Each file follows the **Ricky Run Record** schema (see §2). |
| **Output** | `health_digest/` containing:<br>• `digest.md` – human‑readable summary<br>• `digest.json` – machine‑readable metrics |
| **Trigger** | Manual or scheduled (e.g., nightly) via CI/CD pipeline. |
| **Success Criteria** | - All input files are parsable.<br>- At least 95 % of runs have a *completed* status.<br>- No run exceeds its SLA by > 10 % (configurable).<br>- Digest files are written and checksum‑verified. |
| **Failure Mode** | Any validation gate failure aborts the run and creates a **Blocker Report** (see §5). |

---

## 2. Expected Run Record Schema  

```json
{
  "workflow_id": "string",          // e.g. "ricky/etl/transform"
  "run_id": "string",               // UUID
  "started_at": "ISO8601",          // UTC
  "en
...[truncated]

Must

  • Consume normalized WorkflowRunEvidence rather than raw logs as the primary input.
  • Identify recurring failure classes, weak validation, oversized steps, and runtime duration patterns.
  • Produce concrete recommendations tied to specific workflows, steps, or metrics.

Must Not

  • Mutate evidence while analyzing it.
  • Return generic advice like "improve your workflow" without references.
  • Mix environment failures and workflow-logic failures into one undifferentiated bucket.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.co-equal-surfaces (surfaces-ingress/manual)

User message

Design a new Ricky Slack surface for workflow debugging.

Ricky output

## Ricky Slack Surface – “Workflow Debugger”

A dedicated Slack app (Ricky Debug) that lets engineers **inspect, step‑through, and repair** any Ricky‑managed workflow directly from a channel or DM. The surface is built as a **modal‑driven UI** with deterministic contracts and verification gates that keep the workflow state immutable until a successful commit.

---

### 1. High‑Level Interaction Flow

| Step | Slack UI Element | Action | Result |
|------|------------------|--------|--------|
| 1 | `/ricky debug <workflow‑id>` (slash command) | Launches a **Debug Launcher** modal. | Shows workflow metadata, latest run ID, and a **“Start Debug Session”** button. |
| 2 | **Debug Launcher** modal | Click **Start Debug Session**. | Creates a **debug session object** in the `ricky-debug` DB (immutable snapshot of the selected run). Returns a **session token** (short‑lived JWT). |
| 3 | **Session Overview** modal | Shows a **step list** (ordered by execution index) with status icons (✅, ⚠️, ❌). Each step has **“Inspect”, “Edit”, “Re‑run”** actions. | Engineers can drill into any step. |
| 4 | **Inspect Step** modal | Displays: <br>• Input payload (JSON) <br>• Output payload (JSON) <br>• Ex
...[truncated]

Must

  • Treat Slack as a co-equal product surface, not a wrapper around CLI.
  • Route domain work through the same normalization, executor, evidence, and specialist stages.
  • Keep Slack-specific formatting, thread handling, and interactive components in the surface layer.

Must Not

  • Put workflow generation or diagnosis domain logic directly in the Slack handler.
  • Degrade Slack to a developer shortcut with weaker routing than CLI.
  • Skip signature verification, dedup, or thread handling when the surface is implemented.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.normalizer-is-only-translation-boundary (surfaces-ingress/manual)

User message

Add a new web handoff type that submits a workflow spec and mode.

Ricky output

**Feature:** Add a new *Web Handoff* type – **`WebSubmitSpec`** – that allows a client to POST a workflow specification together with an execution mode (e.g., `dry‑run`, `live`).  

**Location in repo:** `src/handoffs/web/`  

---  

## 1. Contract (API & Data Model)

| Element | Type | Description | Validation |
|---------|------|-------------|------------|
| `POST /hand-off/web-submit-spec` | HTTP endpoint | Accepts a JSON payload containing a workflow spec and a mode. | 200 on success, 400 on validation error, 415 on unsupported media type. |
| Request body | `WebSubmitSpecRequest` | ```json { "spec": { … }, "mode": "dry-run" \| "live" } ``` | • `spec` must be a valid **WorkflowSpec** (see `src/specs/WorkflowSpec.ts`). <br>• `mode` must be one of the allowed enum values. |
| Response body | `WebSubmitSpecResponse` | ```json { "workflowId": "<uuid>", "status": "queued" } ``` | `workflowId` is a UUID v4 generated by the orchestrator. |
| Errors | `ErrorResponse` | ```json { "code": "INVALID_SPEC", "message": "…"} ``` | Standard error envelope used across all handoffs. |

**Schema (TypeScript)** – `src/handoffs/web/types.ts`

```ts
export type ExecutionMode = 'dry-run' | 'live';

e
...[truncated]

Must

  • Add a handoff variant that normalizes into LocalInvocationRequest or CloudGenerateRequest.
  • Keep auth/session validation in the web surface before normalization.
  • Ensure downstream code does not need to know the request came from web.

Must Not

  • Short-circuit directly from the web handler to executors.
  • Create a parallel domain model for web requests.
  • Make the web surface the owner of local or Cloud routing semantics.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.mcp-claude-context-is-metadata (surfaces-ingress/manual)

User message

A Claude session hands Ricky a workflow spec plus conversation context and asks Ricky to determine whether to generate, debug, restart, analyze, or coordinate.

Ricky output

**Decision:** Generate a new workflow (or a revised version) and create the associated coordination artifacts.

**Rationale**

| Factor | Observation | Impact on decision |
|--------|-------------|--------------------|
| **Input type** | You supplied a *workflow specification* together with *conversation context* (e.g., requirements, constraints, prior attempts). | Indicates a design‑time request rather than a runtime error or a need for debugging. |
| **Goal** | The request is to *determine whether to generate, debug, restart, analyze, or coordinate*. | The presence of a spec suggests the next logical step is to **author** (generate) the workflow, then set up the verification and coordination scaffolding. |
| **State of the workflow** | No mention of an existing failing run, missing artifacts, or a need for post‑mortem analysis. | No debugging or restart is required. |
| **Coordination need** | The spec is being handed over from Claude, implying a hand‑off that should be formalized (branch, PR, review). | Coordination steps will be added after generation. |
| **Deterministic verification** | The spec can be turned into a concrete contract (inputs, outputs, gates). | Enables the re
...[truncated]

Must

  • Normalize the tool call through the MCP/Claude handoff path.
  • Treat conversation context as advisory metadata.
  • Make the spec itself sufficient for routing whenever possible.

Must Not

  • Require the user to rewrite the spec manually as a workflow.
  • Put domain routing logic inside the MCP tool handler itself.
  • Ignore provided mode, workflow path, or prior decisions metadata.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.cloud-api-versioning (surfaces-ingress/manual)

User message

Change the Cloud Ricky generate API response to include a new field and change one existing error code name.

Ricky output

**Scope**  
- Add a new field **`requestId`** (string, UUID) to the successful JSON payload returned by **`POST /v1/generate`**.  
- Rename error code **`INVALID_PROMPT`** to **`PROMPT_VALIDATION_ERROR`** in the error‑response schema and all references.  

All changes are limited to the **cloud‑ricky** service repository ( `services/cloud-ricky/` ). No other services are affected.

---

## 1️⃣ Contract – API Specification Update  

| Endpoint | Method | Success 200 Schema (add) | Error Schema (rename) |
|----------|--------|--------------------------|-----------------------|
| `/v1/generate` | POST | ```json { "output": "string", "requestId": "uuid‑v4" }``` | ```json { "code": "PROMPT_VALIDATION_ERROR", "message": "string" }``` |

*The `requestId` must be a RFC‑4122 UUID v4 generated at request entry and echoed back in the response.*  

**Version bump** – Increment the OpenAPI file `openapi.yaml` minor version (`v1.2.0 → v1.3.0`).  

All downstream SDKs will be regenerated from the updated spec.

---

## 2️⃣ Implementation Tasks  

| Task | File(s) | Description |
|------|---------|-------------|
| **2.1** Generate requestId | `src/handler.ts` | `const requestId = uuidv4();` store 
...[truncated]

Must

  • Allow additive response fields within /api/v1/ricky/....
  • Reject renaming or changing the meaning of an existing error code inside the same version.
  • Require a ne

Comment truncated to stay within GitHub limits. Download the ricky-eval-run artifact for the full human-review.md.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant