Skip to content

fix(env-loader): assertRickyWorkflowEnv honors START_FROM resume signal#93

Merged
kjgbot merged 1 commit intomainfrom
fix/env-assert-honors-start-from
May 10, 2026
Merged

fix(env-loader): assertRickyWorkflowEnv honors START_FROM resume signal#93
kjgbot merged 1 commit intomainfrom
fix/env-assert-honors-start-from

Conversation

@kjgbot
Copy link
Copy Markdown
Contributor

@kjgbot kjgbot commented May 10, 2026

Summary

assertRickyWorkflowEnv(['<VAR>', ...]) is injected at the top of main() and runs at module-load before the SDK has a chance to honor --start-from. That made ricky run <workflow> --start-from <step> impossible whenever the missing env var wasn't actually used by the resumed step — the upstream-only step needed it, but it was being skipped.

Repro

A workflow generated against the proactive-pr-remediation spec asserted NANGO_SECRET_KEY at module load. The user's first attempt had it set; a subsequent resume did not:

$ ricky run workflows/generated/ricky-…ts \
    --start-from final-hard-validation \
    --previous-run-id e761ea6d086413ef88e2c8ff
Error: MISSING_ENV_VAR: NANGO_SECRET_KEY. Add missing values to .env.local or export them before rerunning.
    at assertRickyWorkflowEnv (workflows/generated/ricky-…ts:30:11)
    at main (workflows/generated/ricky-…ts:47:3)

The resumed step (final-hard-validation) is a deterministic shell step that does not use NANGO_SECRET_KEY at all (just runs typecheck + scoped npm test + git diff + a marker grep). But the assert fires before the SDK can route execution to that step, blocking the resume entirely.

Fix

When process.env.START_FROM is set — the SDK exports this for --start-from resumes (see src/local/entrypoint.ts:799) — the helper warns-and-continues instead of throwing. Resumed steps that genuinely need a missing env var will still fail with their own signal at the point of use; the check just stops blocking resumes where it has no information about whether the missing var matters.

First-run invocations (no START_FROM) still throw fast — the original fail-fast contract for fresh runs is preserved.

Both injected variants are updated:

  • rickyWorkflowEnvLoaderSource — full helper bundle, used when env loading is being added to a workflow that has neither the loader nor the assert.
  • rickyWorkflowEnvAssertSource — assert-only helper, used when the loader is already present but the assert is missing.

Test plan

  • npm run typecheck — clean
  • npm test — 1066 / 1066 (was 1064)
  • New regression in auto-fix-loop.test.ts: injects an env-assert helper that honors START_FROM for --start-from resumes. Asserts the injected source contains:
    • process.env.START_FROM check
    • console.warn(...) message acknowledging --start-from active
    • Preserved throw new Error(\MISSING_ENV_VAR:` path for non-resume invocations
  • Existing 29 auto-fix-loop tests still pass.

Out of scope (worth a follow-up)

The proper long-term fix is per-step env declaration so the renderer can scope assertions to the steps that actually consume each variable. That requires:

  • A way to declare usesEnv: ['NANGO_SECRET_KEY'] per step (or detect via static analysis of the step's command)
  • The renderer emits per-step assertions inside the step body instead of at module-load
  • The SDK can then short-circuit per-step assertions for skipped steps

That's a meaningful design change beyond this PR. The warn-and-continue gate is the pragmatic unblock for the common case (resuming past an assertion that doesn't apply to the resumed surface).

Relationship to other PRs

Fifth in a stack from this thread that together unblock ricky --mode local --spec-file <md> --run against monorepo specs and their resumes:

  1. fix(generation): fall back to deterministic renderer when persona writer throws #84 — persona-writer fallback (artifact actually gets written)
  2. fix(auto-fix): detect rickyWorkflow* alias imports via TypeScript AST #86 — env-loader alias detection via TS AST (artifact actually loads under Node)
  3. fix(master-renderer): use workspace-aware typecheck command #90 — workspace-aware typecheck (typecheck succeeds in monorepos)
  4. fix(master-renderer): scope final-hard-validation tests to spec workspaces #91 — workspace-aware test scoping (tests don't fail on unrelated packages)
  5. this PR — env-assert honors START_FROM (resumes don't get blocked by upstream-only env vars)

Each is independently shippable.

🤖 Generated with Claude Code

`assertRickyWorkflowEnv(['NANGO_SECRET_KEY', ...])` is injected at the
top of `main()` and runs at module-load before the SDK has a chance
to honor `--start-from`. That made `ricky run <workflow> --start-from
<step>` impossible whenever the missing env var wasn't actually used
by the resumed step — the upstream-only step needed it, but it was
being skipped.

Concrete repro from this thread: a workflow generated against the
proactive-pr-remediation spec asserted NANGO_SECRET_KEY at module
load. The user's first attempt had it set; a subsequent
`--start-from final-hard-validation --previous-run-id <id>` resume
did not. The resumed step (deterministic shell: tsc + scoped npm
test + git diff + grep) does not use NANGO_SECRET_KEY at all, but
the assert fired before the SDK could route execution to that step.

Fix: when process.env.START_FROM is set (the SDK exports this for
--start-from resumes — see src/local/entrypoint.ts:799), the helper
warns-and-continues instead of throwing. Resumed steps that
genuinely need a missing env var will still fail with their own
signal at the point of use; the check just stops blocking resumes
where it has no information about whether the missing var matters.

First-run invocations (no START_FROM) still throw fast — the
original fail-fast contract for fresh runs is preserved.

Both injected variants are updated:
- rickyWorkflowEnvLoaderSource (full helper bundle, used when env
  loading is being added to a workflow that has neither the loader
  nor the assert)
- rickyWorkflowEnvAssertSource (assert-only helper, used when the
  loader is already present but the assert is missing)

Tests:
- New regression case in auto-fix-loop.test.ts:
  "injects an env-assert helper that honors START_FROM for
  --start-from resumes" — asserts the injected source contains the
  process.env.START_FROM check, the console.warn message
  acknowledging --start-from, and that the throw path is preserved
  for non-resume invocations.
- Existing 29 auto-fix-loop tests still pass; full ricky suite at
  1066 / 1066 (was 1064).

Out of scope (worth a follow-up):
- The proper long-term fix is per-step env declaration so the renderer
  can scope assertions to the steps that actually consume each
  variable. That requires SDK changes beyond this PR. The warn-and-
  continue gate is the pragmatic unblock for the common case (resuming
  past an assertion that doesn't apply to the resumed surface).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 10, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

The PR makes the injected environment assertion helper resume-aware: when process.env.START_FROM is set (indicating a --start-from resume), missing environment variables trigger a warning and return instead of throwing, allowing workflow resumption to proceed. A regression test verifies both the resume and non-resume paths.

Changes

Resume-Aware Environment Assertion

Layer / File(s) Summary
Core Implementation
src/local/auto-fix-loop.ts
Both rickyWorkflowEnvLoaderSource and rickyWorkflowEnvAssertSource templates are updated: the embedded assertRickyWorkflowEnv now returns early if no env vars are missing, and returns (with console.warn) instead of throwing when process.env.START_FROM is set.
Regression Test
src/local/auto-fix-loop.test.ts
New test case verifies that the injected helper respects process.env.START_FROM, uses warn-and-continue for resume mode, and preserves fast-throw for non-resume execution with missing env vars.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

  • AgentWorkforce/ricky#49: Both PRs modify the injected assertRickyWorkflowEnv behavior in src/local/auto-fix-loop.ts and tests—the main PR adds resume-awareness via process.env.START_FROM on top of the missing-env injection infrastructure.
  • AgentWorkforce/ricky#47: Both PRs address MISSING_ENV_VAR handling in src/local/auto-fix-loop.ts—the main PR makes the injected helper resume-aware while the retrieved PR escalates setup blockers.
  • AgentWorkforce/ricky#86: Both PRs modify deterministic auto-fix injection logic in src/local/auto-fix-loop.*—resume-awareness injection versus env-loader and alias import detection.

Poem

🐰 A resume that won't break today,
When envars go missing, we warn-warn-way,
START_FROM whispers, "Please continue,"
No throw, just a skip—and we're through, too!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: updating assertRickyWorkflowEnv to honor START_FROM for resume workflows.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, explaining the problem, fix, test plan, and relationship to other PRs.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/env-assert-honors-start-from

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/local/auto-fix-loop.test.ts (1)

620-638: ⚡ Quick win

Add an explicit regression assertion for the assert-only helper path.

This test validates the loader-injection variant well, but not the branch where RICKY_WORKFLOW_ENV_LOADER already exists and only assertRickyWorkflowEnv is injected. Adding a focused assertion for that path would lock in START_FROM behavior for both variants.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/local/auto-fix-loop.test.ts` around lines 620 - 638, Add a focused
assertion verifying the "assert-only" helper path when RICKY_WORKFLOW_ENV_LOADER
is already present: in the test 'injects an env-assert helper that honors
START_FROM for --start-from resumes' (which calls
repairWorkflowDeterministically), simulate the scenario where
RICKY_WORKFLOW_ENV_LOADER is present and ensure the returned repair.content
contains the injected assertRickyWorkflowEnv helper usage that honors
process.env.START_FROM (e.g., contains 'assertRickyWorkflowEnv' and
'process.env.START_FROM' and the console.warn / throw behaviors), so both the
loader-injection and assert-only branches are locked in for START_FROM behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/local/auto-fix-loop.test.ts`:
- Around line 620-638: Add a focused assertion verifying the "assert-only"
helper path when RICKY_WORKFLOW_ENV_LOADER is already present: in the test
'injects an env-assert helper that honors START_FROM for --start-from resumes'
(which calls repairWorkflowDeterministically), simulate the scenario where
RICKY_WORKFLOW_ENV_LOADER is present and ensure the returned repair.content
contains the injected assertRickyWorkflowEnv helper usage that honors
process.env.START_FROM (e.g., contains 'assertRickyWorkflowEnv' and
'process.env.START_FROM' and the console.warn / throw behaviors), so both the
loader-injection and assert-only branches are locked in for START_FROM behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 61d29f7d-a742-49d8-8da4-8b4bdb5aa403

📥 Commits

Reviewing files that changed from the base of the PR and between 1f08dec and 1f4acd1.

📒 Files selected for processing (2)
  • src/local/auto-fix-loop.test.ts
  • src/local/auto-fix-loop.ts

@kjgbot kjgbot merged commit b97a41e into main May 10, 2026
2 of 3 checks passed
@kjgbot kjgbot deleted the fix/env-assert-honors-start-from branch May 10, 2026 18:42
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 1 additional finding.

Open in Devin Review

@github-actions
Copy link
Copy Markdown
Contributor

Ricky Eval Review

Run: .ricky/evals/runs/2026-05-10T18-40-49-533Z-HEAD-provider
Mode: provider
Git SHA: 95e37de

Passed: 8 | Needs human: 43 | Reviewable: 43 | Missing output: 0 | Failed: 0 | Skipped: 0

Human Review Cases

These cases passed deterministic checks and include captured Ricky output for a human verdict against their Must / Must Not rubric.

REVIEW agent-assistant-boundary.real-reuse-not-rhetorical (agent-assistant-boundary/manual)

User message

Update Ricky docs and code to say it uses Agent Assistant more deeply.

Ricky output

**Summary**  
We will extend the Ricky repository so that the README, module doc‑strings, and the `ricky/__init__.py` metadata all state that *Ricky now leverages the Agent Assistant (AA) library for deeper orchestration, validation, and self‑repair*. The change is purely informational – no functional code changes are required – but we will add a small “integration shim” that imports the AA core and exposes a `ricky.agent_assistant` namespace for downstream users. This keeps the contract explicit and future‑proofs the code base.

---

## 1. Scope & Branch Strategy  

| Item | Detail |
|------|--------|
| **Branch name** | `feature/aa‑integration‑doc‑update` |
| **PR target** | `main` |
| **Files touched** | `README.md`, `docs/architecture.md`, `ricky/__init__.py`, `ricky/agent_assistant/__init__.py`, `ricky/__all__.py` |
| **Tests** | No new tests required; existing CI will run lint and doc‑build checks. |
| **Reviewers** | `@team-lead`, `@doc-maintainer`, `@core-dev` |
| **Verification gate** | CI pipeline must pass `flake8`, `mypy`, and `sphinx-build`. The PR must contain a **Documentation Review Checklist** (see §4). |

---

## 2. Concrete Changes  

### 2.1 `README.md` (Top‑lev
...[truncated]

Must

  • Ground claims in real package imports and runtime paths.
  • Distinguish current implementation from target architecture.
  • Identify which Agent Assistant primitive is actually exercised.

Must Not

  • Rename local code to sound Agent Assistant aligned and count that as adoption.
  • Claim broad Agent Assistant native behavior from documentation-only alignment.
  • Blur target architecture with landed behavior.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.turn-context-preserves-ricky-envelope (agent-assistant-boundary/manual)

User message

Evaluate the current Ricky @agent-assistant/turn-context adoption.

Ricky output

**Ricky @agent‑assistant/turn‑context Adoption Review**  
*(as of the latest CI run on 2026‑04‑28)*  

| Metric | Current Value | Target / Trend | Status |
|--------|---------------|----------------|--------|
| **Workflow contracts defined** | 7 of 12 possible contracts (≈58 %) | ≥90 % | ⚠️ |
| **Deterministic verification gates** | 4/7 contracts have automated gate checks | 100 % | ⚠️ |
| **Review artifacts (design docs, test plans)** | 9 artifacts attached to PRs | ≥1 per contract | ✅ |
| **Validation loops (80‑100 % pass)** | Avg. pass = 73 % over 112 loops | ≥80 % | ⚠️ |
| **Blocker reports** | 3 open blockers (missing gate, flaky test, missing doc) | 0 open blockers | ❌ |
| **Branch/PR scoping** | 5 of 12 PRs exceed scoped boundary (mixing unrelated changes) | 0 | ❌ |
| **Adoption velocity** | 2 new contracts added / month (down from 4/mo) | ≥3/mo | ⚠️ |
| **Team compliance** | 62 % of contributors use the `@agent-assistant/turn-context` template | ≥80 % | ⚠️ |

---

### Key Findings

1. **Partial Contract Coverage**  
   - Only 7 of the 12 identified turn‑context contracts are fully authored. Missing contracts: *State‑Sync*, *Error‑Recovery*, *Rate‑Limit‑Enforcement*, *Teleme
...[truncated]

Must

  • Preserve request id, source metadata, structured spec, invocation root, mode, stage mode, spec path, metadata, and spec text.
  • Record compact provenance through generation decisions or coordinator metadata.
  • Keep the shared turn context internal to the adapter boundary.

Must Not

  • Move LocalResponse, blocker taxonomy, recovery wording, or execution semantics into the shared turn-context package.
  • Drop Ricky-specific workflow metadata during envelope assembly.
  • Treat turn context as a product decision engine.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.product-core-stays-ricky-owned (agent-assistant-boundary/manual)

User message

Decide whether workflow generation, validation, debugging, staged CLI UX, and blocker/evidence wording should move into Agent Assistant.

Ricky output

**Recommendation:** Move **all** of the listed capabilities—workflow generation, validation, debugging, staged CLI UX, and blocker/evidence wording—into the **Agent Assistant** component.

---

## Rationale (aligned with AGENTS.md and workflow standards)

| Capability | Why it belongs in Agent Assistant | Impact on reliability & coordination |
|------------|-----------------------------------|--------------------------------------|
| **Workflow generation** | Agent Assistant already owns the *authoring contract* (input schema → workflow definition). Centralizing generation avoids duplicated template logic across agents. | Guarantees a single source‑of‑truth for workflow shape; downstream agents can rely on a deterministic contract. |
| **Validation** | Validation gates (schema, deterministic‑run, idempotency) are the first line of defense before a workflow is persisted. Agent Assistant is the natural gatekeeper. | Early failure detection reduces downstream “broken‑pipeline” noise and keeps the 80‑100 validation loop tight. |
| **Debugging** | Debug utilities (trace‑injection, sandbox execution, diff‑reporting) require intimate knowledge of the generation output. Housing them in Age
...[truncated]

Must

  • Keep product-defining workflow generation, validation, debugging, local UX, and evidence wording Ricky-owned until proof says otherwise.
  • Reuse shared runtime primitives where they reduce duplication without weakening Ricky.
  • Make extraction follow typed, tested, live product proof.

Must Not

  • Generalize workflow-specific behavior prematurely.
  • Adopt moving shared seams merely for architectural purity.
  • Lose the precise local-first staged workflow UX.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.one-slice-at-a-time (agent-assistant-boundary/manual)

User message

Plan the next Agent Assistant adoption slice for Ricky.

Ricky output

**Agent Assistant Adoption Slice – “Ricky v2.0 Roll‑out”**  
*Target window: 12 weeks (May 15 – Aug 7 2024)*  

---

## 1. Slice Objective
Enable **90 % of active AgentWorkforce teams** to adopt the new “Ricky v2.0” workflow‑reliability engine for **continuous coordination, deterministic verification, and automated review** while maintaining zero‑downtime for existing pipelines.

---

## 2. Success Metrics (KPIs)

| Metric | Target | Measurement |
|--------|--------|--------------|
| **Adoption rate** | ≥ 90 % of teams onboarded | Teams‑registered in the Adoption Dashboard |
| **Mean Time to Verify (MTTV)** | ≤ 2 min per workflow contract | Automated gate logs |
| **Defect leakage** | ≤ 1 % of post‑release bugs attributable to workflow gaps | Post‑release defect triage |
| **Reviewer satisfaction** | ≥ 4.5/5 (survey) | End‑of‑slice survey |
| **Blocker resolution time** | ≤ 24 h for critical blockers | Issue tracker SLA |

---

## 3. Scope & Boundaries

| In‑Scope | Out‑of‑Scope |
|----------|--------------|
| • Migration of **core CI/CD pipelines** (build, test, deploy) to Ricky contracts | • Legacy batch jobs not under CI/CD |
| • Definition of **deterministic verification gates*
...[truncated]

Must

  • Pick exactly one real shared seam to evaluate or adopt.
  • Define a live Ricky product path that will prove the adoption.
  • Include regression checks that product messaging, blocker output, and evidence remain truthful.

Must Not

  • Bundle sessions, memory, policy, proactive behavior, and execution extraction into one vague migration.
  • Skip the comparison/evaluation step for mature Ricky-local seams.
  • Treat adoption as successful without a live product-path proof.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.future-surfaces-use-shared-runtime (agent-assistant-boundary/manual)

User message

Design future Slack or web support for Ricky using Agent Assistant packages.

Ricky output

## Ricky × Slack / Web Support – Agent‑Assistant Package Design  

Below is a **complete, production‑ready package spec** that lets Ricky (the reliability‑coordination assistant) surface its capabilities inside Slack and a generic web‑chat widget. The design follows the **Ricky repository conventions** (AGENTS.md), the **workflow standards** (deterministic contracts, verification gates, review artifacts, 80‑100 validation loops), and the **shared authoring rules** (scoped branches/PRs, blocker reporting).

---

### 1. High‑Level Architecture  

| Component | Responsibility | Runtime | Communication |
|-----------|----------------|---------|----------------|
| **Ricky Core** | Workflow orchestration, reliability checks, coordination logic | Python 3.11 (container) | Internal RPC (gRPC) |
| **Slack Adapter** | Translate Slack events ↔ Ricky RPC, handle OAuth, message formatting | Node 18 (serverless) | Slack Events API, `chat.postMessage` |
| **Web‑Chat Adapter** | WebSocket bridge for the embedded widget, UI rendering, auth | Node 18 (serverless) | WebSocket, REST for token exchange |
| **Agent‑Assistant Package (AAP)** | Declarative contract bundle (`manifest.yaml`, `schema.json`, 
...[truncated]

Must

  • Prefer shared surfaces, webhook-runtime, sessions, and routing primitives for future non-CLI interaction where mature.
  • Keep local CLI behavior product-local unless shared adoption is proven harmless.
  • Explain which behavior is future/target architecture versus implemented today.

Must Not

  • Preemptively add memory, policy, or proactive packages without a real Ricky product requirement.
  • Let future surface abstractions distort the current CLI contract.
  • Duplicate a mature Agent Assistant capability locally without justification.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.generation-default-not-execution (cli-behavior/manual)

User message

A user runs ricky --mode local --spec "generate a workflow for package checks" without --run.

Ricky output

**What happens when you run**

```bash
ricky --mode local --spec "generate a workflow for package checks"

without the --run flag?


1. Mode = local

  • Ricky operates against the local repository copy (no remote fetches, no CI‑only resources).
  • All file reads/writes, linting, and validation are performed against the current checkout.

2. --spec argument

  • The string after --spec is interpreted as a workflow‑generation request.
  • Ricky parses the natural‑language spec, maps it to the workflow contract defined in workflows/specs/, and creates a draft workflow file (e.g., workflows/package-checks.yml) in a staging area.

3. Absence of --run

  • No execution: Ricky stops after the draft is written.
  • It does not:
    • Trigger any workflow steps (lint, test, build, deploy).
    • Open a PR or push changes.
    • Invoke any verification gates (deterministic checks, unit‑test suites, etc.).

4. What you’ll see

  1. Console output – a concise summary:
    [INFO] Parsed spec: generate a workflow for package checks
    [INFO] Draft workflow written to workflows/package-checks.yml
    [INFO] Run with '--run' to validat
    

...[truncated]


**Must**

- Say generation is the default and execution was not requested.
- Print the generated artifact path, workflow id, spec digest, and next run command.
- Avoid showing execution evidence for a generation-only request.

**Must Not**

- Imply the workflow ran automatically.
- Present a generation-only result as execution success.
- Hide the opt-in commands for running the artifact.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>cli.first-run-copy-is-compact-and-truthful</code> (cli-behavior/manual)</summary>

**User message**

> Render Ricky's first-run CLI onboarding for a new user.

**Ricky output**

```text
**Ricky – First‑Run CLI Onboarding**

Below is the complete, ready‑to‑run onboarding workflow for a brand‑new user of the **Ricky** AgentWorkforce CLI. Follow the steps in order; each command is deterministic and idempotent, and every stage includes a verification gate that will abort the flow with a clear error message if a blocker is encountered.

---

## 1️⃣ Prerequisites (Verification Gate)

```bash
# Verify Node.js (>=18) and npm (>=9) are installed
node -v | grep -E '^v(1[8-9]|[2-9][0-9])' || { echo "❌ Node.js >=18 required"; exit 1; }
npm -v | grep -E '^[9-9]'          || { echo "❌ npm >=9 required"; exit 1; }

# Verify Git is installed
git --version || { echo "❌ Git is required"; exit 1; }

If any of the above checks fail, install the missing tool before proceeding.


2️⃣ Install the Ricky CLI (Deterministic Contract)

# Install globally from the official npm registry
npm i -g @ricky/agentworkforce-cli@latest

# Verify installation
ricky --version || { echo "❌ Ricky CLI installation failed"; exit 1; }

The CLI is version‑locked to the latest stable release at install time.


3️⃣ Initialise a New Workspace (Scoped Branch)

# Create a
...[truncated]

Must

  • Show compact Ricky branding and clear Local / BYOH, Cloud, Both, and Just explore choices.
  • End every branch with a concrete next step.
  • Advertise only commands that are currently implemented.

Must Not

  • Sound like a launch page or documentation dump.
  • Claim Ricky runs workflows by default when generation is the default path.
  • Require web or Slack onboarding before CLI use.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.recovery-guidance-no-stack-traces (cli-behavior/manual)

User message

A user gives Ricky an empty spec or a missing spec file.

Ricky output

**Ricky’s Standard Response to an Empty or Missing Specification**

---

### 1. Immediate Detection & Blocker Report
| Step | Action | Output |
|------|--------|--------|
| **1.1** | Verify the presence of the spec file (`*.spec.yaml` or `*.spec.json`). | If the file is not found, raise a **Blocker**: “❗ Spec file missing – cannot proceed.” |
| **1.2** | If the file exists but is empty or contains only comments/whitespace, raise a **Blocker**: “❗ Spec file empty – no actionable content.” |
| **1.3** | Log the blocker in the workflow run (e.g., `ricky-workflow.log`) and attach a **GitHub Issue** template (see § 2). | Issue ID recorded for traceability. |

---

### 2. Automated Issue Creation (Deterministic Gate)

**Issue Template – `RICKY‑MISSING‑SPEC`**

Title: [BLOCKER] Missing or empty spec for <component‑name>
Labels: blocker, spec‑missing
Assignees: @

Description

The workflow for <component‑name> cannot be generated because the required specification file is missing or empty.

Required Action

  1. Provide a valid <component‑name>.spec.yaml (or .json) in the spec/ directory.
  2. Ensure the spec follows the AgentWorkforce Spec Schema v2.1 (see docs
    ...[truncated]

**Must**

- Return a user-facing failure or guidance message with a real recovery command.
- Distinguish generation failure from execution failure.
- Show stack traces only when verbose diagnostic mode is requested.

**Must Not**

- Crash with an uncaught exception in normal mode.
- Suggest commands that do not exist.
- Pretend a missing spec was accepted.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>cli.status-does-not-invent-provider-state</code> (cli-behavior/manual)</summary>

**User message**

> Render `ricky status` when no provider checks have proven Google or GitHub are connected.

**Ricky output**

```text
**`ricky status` output (no provider checks have proven Google or GitHub are connected)**  

=== AgentWorkforce Status Report ===

🟡 Provider Connectivity
• Google : NOT VERIFIED (no successful check)
• GitHub : NOT VERIFIED (no successful check)

🔧 Configuration
• Google OAuth client ID :
• GitHub App ID :
• Credential store : OK

⚙️ Runtime
• Agent version : v2.4.1
• Last health check : 2026‑05‑10 08:12:43 UTC
• Active workflows : 12
• Pending jobs : 0

🚨 Alerts
• Provider verification pending for Google and GitHub.
• Some features that depend on these providers may be disabled until connectivity is confirmed.

📝 Next steps

  1. Run provider checks:
    ricky provider check google
    ricky provider check github
  2. Review the detailed logs if checks fail:
    ricky logs --provider google
    ricky logs --provider github
  3. Re‑authenticate or update credentials as needed.

--- End of Report ---

Must

  • Report unknown or not-connected provider state honestly.
  • Update provider status only from explicit provider checks or Cloud status results.
  • Give concrete setup guidance for Cloud when relevant.

Must Not

  • Mark Google or GitHub connected because guidance text was shown.
  • Invent a provider connection URL or OAuth flow.
  • Show empty fields with no recovery guidance when config is missing.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.skill-matcher-registry-backed (generation-quality/manual)

User message

Generate a workflow for a GitHub primitive change and explain which skills Ricky selected.

Ricky output

## GitHub Primitive‑Change Workflow  
**Purpose** – Safely introduce a single‑file edit (e.g., a typo fix, comment update, or small config tweak) into a protected repository while guaranteeing traceability, automated verification, and clear hand‑off points for reviewers.

---

### 1. Workflow Contract (YAML‑style summary)

```yaml
name: primitive-change
description: One‑file edit on a protected repo
trigger:
  - manual (GitHub UI “Run workflow”)
inputs:
  repo:
    type: string
    description: "owner/repo to modify"
  branch:
    type: string
    default: "main"
  file_path:
    type: string
    description: "Relative path of the file to edit"
  edit_mode:
    type: enum
    enum: [replace_line, append, delete_line]
    description: "Kind of primitive edit"
  new_content:
    type: string
    description: "Content to write (or line to append)"
  pr_title:
    type: string
    default: "Primitive change: {{file_path}}"
  pr_body:
    type: string
    default: "Automated primitive change generated by Ricky."
outputs:
  pr_number:
    type: integer
    description: "Pull‑request number created"
  status:
    type: enum
    enum: [success, blocked, failed]
    description: "Final work
...[truncated]

Must

  • Select skills from the actual registry rather than a hardcoded static set.
  • Record ranked skill matches with confidence and trigger evidence.
  • Fall back gracefully when the registry is missing or no skill clears the threshold.

Must Not

  • Claim runtime agents embody skills when only generation-time guidance was applied.
  • Fail generation solely because optional skill files are missing.
  • Hide skill selection evidence from artifacts or JSON output.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.tool-selector-honors-spec-hints (generation-quality/manual)

User message

Generate a workflow where the spec says "use Claude to review and Codex to implement".

Ricky output

**Workflow: Claude‑Review → Codex‑Implementation**

| **Section** | **Content** |
|-------------|-------------|
| **Goal** | Produce production‑ready code from a high‑level feature description, using Claude for design/logic review and Codex for code generation. |
| **Scope** | • Feature description (Markdown) <br>• Design review (Claude) <br>• Code generation (Codex) <br>• Automated tests & static analysis <br>• Human sign‑off |
| **Branch/PR Model** | *feature/claude‑review‑codex‑impl* → PR → *main* (protected). All artifacts (review notes, generated code, test results) attached to the PR. |
| **Roles** | • **Product Owner (PO)** – supplies feature spec. <br>• **Claude Reviewer** – LLM (Claude) invoked via `claude-review` tool. <br>• **Codex Generator** – LLM (Codex) invoked via `codex-gen` tool. <br>• **CI Bot** – runs verification gates. <br>• **Human Engineer** – final sign‑off. |
| **Artifacts** | 1. `spec.md` – original feature description. <br>2. `design_review.md` – Claude output (design, edge‑case analysis, API contract). <br>3. `generated_code/` – Codex‑produced source files. <br>4. `test_suite/` – auto‑generated unit tests. <br>5. `ci_report.json` – deterministic verific
...[truncated]

Must

  • Assign per-step runner/model decisions from explicit spec hints where possible.
  • Let skill metadata or project defaults fill gaps when the spec is silent.
  • Write tool-selection.json or equivalent audit metadata.

Must Not

  • Use one generic runtime default for every agent despite explicit hints.
  • Let skill defaults override explicit user/spec runner hints.
  • Omit the reason each tool or model was chosen.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.refine-is-opt-in-and-bounded (generation-quality/manual)

User message

Use --refine to sharpen a generated workflow's step task descriptions and acceptance gates.

Ricky output

## Using `--refine` to Sharpen Step Descriptions & Acceptance Gates  

`--refine` is the **deterministic post‑processing flag** that takes a freshly‑generated workflow (JSON/YAML) and runs a second, focused generation pass that:

1. **Tightens the natural‑language task description** for each step.  
2. **Adds or tightens acceptance‑gate predicates** (pre‑/post‑conditions, success criteria, timeout limits).  
3. **Leaves the overall topology unchanged** (no new steps, no re‑ordering).  

Below is the end‑to‑end contract, verification gates, and a minimal example you can copy‑paste into your CI pipeline.

---

### 1. Contract for `--refine`

| Input | Required fields | Format |
|-------|----------------|--------|
| `workflow` | `steps[]` with `id`, `task`, `gate` (optional) | JSON or YAML |
| `refine‑spec` *(optional)* | `task‑style`, `gate‑style`, `max‑tokens` | JSON |

**Output** – a workflow identical in shape to the input but with every `steps[i].task` and `steps[i].gate` replaced by refined strings that satisfy the style constraints.

**Determinism guarantee** – The same input + same `refine‑spec` → identical output (seeded RNG).  

**Failure mode** – If any step cannot be refin
...[truncated]

Must

  • Keep the deterministic unrefined path as the default.
  • Bound the refinement pass by timeout, token budget, and editable regions.
  • Re-run validation after refinement and fall back to the deterministic artifact on unsafe edits or provider failure.

Must Not

  • Change the workflow graph, agent assignments, or side-effect scope during refinement.
  • Fail the whole generation if optional refinement times out.
  • Hide warnings when the deterministic artifact is returned unchanged.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.behavior-grounded-gates (generation-quality/manual)

User message

Generate a workflow for the ricky --version spec.

Ricky output

**Workflow: `ricky --version` Specification & Delivery**  
*(Ricky AgentWorkforce – v0.9.3)*  

---  

### 1. Objective
Produce a deterministic, test‑driven implementation of the `ricky --version` CLI flag that:

* Returns the current Ricky version string (e.g., `ricky 0.9.3`).
* Exits with status 0 on success, non‑zero on unexpected failure.
* Is fully covered by unit tests, integration tests, and CI verification gates.
* Is delivered via a scoped branch/PR with complete review artifacts.

---  

### 2. Contract (Interface Definition)

| Element | Description |
|---------|-------------|
| **Command** | `ricky --version` |
| **Output** | `ricky <semver>` printed to **stdout** followed by a newline. |
| **Exit Code** | `0` on success, `>0` on any internal error (e.g., version file missing). |
| **Side‑effects** | None (pure read‑only). |
| **Dependencies** | Reads version from `src/__init__.py` (`__version__` variable) or `pyproject.toml` if the former is unavailable. |
| **Platform** | Linux/macOS/Windows (Python 3.9+). |
| **Error Message** | If version cannot be resolved: `Error: unable to determine Ricky version` → **stderr**, exit 1. |

*Contract file:* `contracts/ricky_version
...[truncated]

Must

  • Build gates from the stated acceptance behavior, such as checking ricky --version output.
  • Avoid generic source-shape grep checks when the spec asks for CLI behavior.
  • Keep generated validation meaningful for the current repo shape.

Must Not

  • Treat grep -Eq 'export|function|class|workflow(' dist/ricky.js as proof of version behavior.
  • Claim the workflow is proven by source syntax alone.
  • Ignore the package-json version resolution order in the spec.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.pattern-selection-deliberate (generation-quality/manual)

User message

Generate a workflow for many independent artifacts with a validation/fix/rerun loop.

Ricky output

**Workflow Name:** `multi‑artifact‑validation‑fix‑rerun`

**Purpose**  
Process a batch of independent artifacts (e.g., data files, code modules, config bundles) through a deterministic **Validate → Fix → Rerun** loop until every artifact meets its contract or a hard blocker is raised.

**Scope**  
- One PR/branch per batch (max 200 artifacts).  
- Each artifact is self‑contained; no cross‑artifact dependencies.  
- Loop limit: **80–100 iterations** per artifact (configurable).  
- Failure to converge after the limit → **Blocker**.

---

## 1. Contract Definition (per artifact)

| Field | Type | Requirement |
|-------|------|-------------|
| `id` | string | Unique within batch |
| `type` | enum(`code`,`data`,`config`) | Determines validator/fixer |
| `source_path` | string | Relative path in repo |
| `checksum` | string (SHA‑256) | Must match after each successful run |
| `metadata` | object | Optional free‑form |
| `status` | enum(`pending`,`valid`,`fixed`,`failed`) | Updated by workflow |

**Validator Interface** (`validate_<type>(artifact) → ValidationResult`)  

```ts
type ValidationResult = {
  ok: boolean,
  errors: string[],          // empty if ok
  warnings?: string[]
}
``
...[truncated]

Must

  • Choose dag, supervisor, or pipeline deliberately based on the work shape.
  • Explain the pattern choice in artifact metadata or a rationale.
  • Use dag for validation/fix/rerun loops when dependencies matter.

Must Not

  • Default blindly to dag for every workflow.
  • Collapse independent artifact work into one vague agent task.
  • Omit verification gates because the chosen pattern seems obvious.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.skill-boundary-copy (generation-quality/manual)

User message

Describe how selected workflow-writing skills affected a generated Ricky workflow.

Ricky output

### How Selected Workflow‑Writing Skills Shaped the Generated Ricky Workflow  

| Skill | What the Skill Does | Direct Effect on the Generated Workflow | Why It Matters for Reliability & Coordination |
|------|----------------------|----------------------------------------|-----------------------------------------------|
| **Deterministic Contract Definition** | Declares explicit input‑output schemas, versioned contracts, and required data‑type constraints. | The workflow begins with a **`contract`** block that validates the incoming payload against a JSON‑Schema (`v1.2`). If the payload fails, the workflow aborts before any side‑effects run. | Guarantees that downstream agents only see well‑formed data, preventing silent corruption and making the contract a single source of truth for all reviewers. |
| **Verification Gate Insertion** | Inserts automated checks (unit tests, schema validation, policy compliance) at logical breakpoints. | After each major step (e.g., *fetch‑source*, *transform*, *publish*), a **`gate`** node runs a lightweight test suite (`npm test --filter stepX`). Failure triggers a **`blocker`** flag and routes the run to a **`review`** branch. | Early detection o
...[truncated]

Must

  • Describe skills as generation-time selection, loading, and rendering inputs.
  • Point to metadata such as loaded-skills.txt and skill-application-boundary.json.
  • Say the workflow instructions were informed by selected skills.

Must Not

  • Say runtime agents are skill-embedded, skill-powered at runtime, or embody skills unless runtime skill loading is implemented and tested.
  • Treat metadata existence as proof that agents received skill bodies at runtime.
  • Overstate current tests beyond the generation boundary.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.no-pure-codegen-without-proof (generation-quality/manual)

User message

Generate a workflow from a vague product spec and return it to the user.

Ricky output

**Ricky – Workflow Generation Assistant**

To turn a product specification into a concrete, test‑able workflow we need a few key details. Below is a short checklist; please fill in the items that apply to your product. Once we have this information I can generate a complete workflow contract (including stages, inputs/outputs, verification gates, and review artifacts) that follows our AgentWorkforce standards.

| # | Required Detail | Why It Matters |
|---|----------------|----------------|
| 1 | **High‑level goal** – What is the end‑user outcome the workflow must deliver? | Defines the final “success” state and drives acceptance criteria. |
| 2 | **Primary actors & roles** – Who (or which agents) will invoke the workflow, and who are the downstream reviewers/approvers? | Determines authentication, permission scopes, and hand‑off points. |
| 3 | **Core stages** – List the major functional steps (e.g., “Validate Input → Enrich Data → Persist → Notify”). | Forms the backbone of the workflow DAG. |
| 4 | **Inputs & outputs per stage** – Data schema (JSON, protobuf, etc.) for each transition. | Enables deterministic verification gates and contract generation. |
| 5 | **Verification gate
...[truncated]

Must

  • Produce a Relay-native TypeScript workflow with explicit verification, review, and signoff.
  • Validate with dry-run or targeted structural checks where possible.
  • Return artifacts, warnings, and follow-up commands honestly.

Must Not

  • Act like Ricky is a pure code-generation bot that emits workflows without verification.
  • Stop at "code compiles" as the proof bar.
  • Skip skill-aware workflow authoring guidance for serious workflows.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.classify-before-retry (runtime-recovery/manual)

User message

A workflow failed after one step timed out and another worker stayed alive without producing artifacts. Explain what Ricky should do next.

Ricky output

**Ricky’s post‑failure playbook**

1. **Immediate detection & logging**  
   - The orchestrator (Ricky‑Orchestrator) receives the *step‑timeout* event and the *worker‑alive‑no‑artifact* heartbeat.  
   - It writes a structured log entry (JSON) with:
     ```json
     {
       "workflow_id": "<id>",
       "failed_step": "Step‑3‑Transform",
       "error_type": "timeout",
       "alive_worker_id": "worker‑7a9c",
       "artifact_status": "missing",
       "timestamp": "2026‑05‑10T14:32:07Z"
     }
     ```

2. **Mark the run as **BLOCKED** and generate a **Blocker Report**  
   - Status transition: `RUNNING → BLOCKED`.  
   - The report includes:
     * step name, timeout threshold, actual duration, last heartbeat, expected artifact schema, and any partial logs.  
     * A **risk rating** (e.g., *high* because downstream steps depend on the missing artifact).  
   - The report is stored under `artifacts/blockers/<workflow_id>/<run_id>.md` and automatically linked to the workflow’s GitHub issue (or internal ticket).

3. **Create a deterministic verification gate**  
   - A *Gate* named `Gate‑Transform‑Artifact‑Exists` is inserted into the workflow graph **after** the failed step.  
 
...[truncated]

Must

  • Classify the failure before deciding whether to retry.
  • Distinguish agent-runtime opacity, timeout, environment blockers, workflow-structure bugs, and validation-strategy mismatch.
  • Preserve raw evidence and uncertainty when the class is not known.

Must Not

  • Blindly rerun the whole workflow without a blocker classification.
  • Treat every failure as a broken workflow definition.
  • Claim Ricky fixed the workflow before rerun evidence proves it.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.stale-relay-state (runtime-recovery/manual)

User message

Ricky detects stale .agent-relay/, .relay/, and .trajectories/ state before launching a local workflow.

Ricky output

**Workflow: Detect and Guard Against Stale Relay State Before a Local Run**  

| **Stage** | **Purpose** | **Inputs** | **Outputs** | **Deterministic Gate** | **Owner** |
|-----------|-------------|------------|-------------|------------------------|-----------|
| **1️⃣ Scan Workspace** | Locate any stale state directories. | `repo root` | List of paths that exist: `.agent‑relay/`, `.relay/`, `.trajectories/` | ‑ Exists‑Check (fs.existsSync) | `ricky‑scanner` |
| **2️⃣ Timestamp Validation** | Ensure the directories are from the current session. | Directory stats (mtime) | Boolean `isFresh` per dir | ‑ Age‑Check (now – mtime ≤ 5 min) | `ricky‑validator` |
| **3️⃣ Content Hash Check** | Detect leftover files that differ from the clean baseline. | File hashes inside each dir | Map `{path → hashMismatch: true/false}` | ‑ Hash‑Check (sha256) against baseline manifest (`.agent‑relay/.manifest.json`) | `ricky‑hasher` |
| **4️⃣ Blocker Decision** | Decide whether to abort or continue. | Results of 2 & 3 | `blocker: true/false` + `reason` | ‑ Block‑Rule: `if any isFresh===false OR any hashMismatch===true → blocker=true` | `ricky‑gatekeeper` |
| **5️⃣ Reporting** | Emit a concise, machine‑r
...[truncated]

Must

  • Classify stale local runtime state as an environment contamination issue.
  • Recommend quarantine or isolated-run guidance before launch.
  • Record the observed paths and the action taken or recommended.

Must Not

  • Treat stale runtime state as a workflow logic failure.
  • Delete or overwrite state without an explicit safe path or user intent.
  • Continue into execution as if the workspace were clean.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.already-running-conflict (runtime-recovery/manual)

User message

A run marker says another Ricky or Relay run is already active in this workspace.

Ricky output

**Fixing “Another Ricky/Relay run is already active in this workspace”**

The marker is a lock file that prevents two concurrent executions from stepping on each other. When it’s left behind (e.g., a crash, a killed terminal, or a stray background process) you’ll see the error you described. Follow the deterministic, low‑risk repair flow below.

---

## 1️⃣ Verify the current state

```bash
# Show the lock file (if any) and its contents
cat .ricky/run.lock || echo "No lock file found"
# List any Ricky/Relay processes still attached to this workspace
ps aux | grep -iE 'ricky|relay' | grep -v grep

If the lock file is missing → the error is coming from a stale in‑memory lock; skip to step 3.
If a process is listed → you have a live run; consider whether it should finish or be terminated.


2️⃣ Graceful termination (preferred)

  1. Identify the PID from the ps output.
  2. Ask the run to stop (if it’s a long‑running job you can let it finish):
    ricky stop --pid <PID>
  3. Wait until the process exits and the lock file disappears (max 30 s).

If the process does not respond, proceed to step 3.


3️⃣ Forced cleanup (deterministic

...[truncated]


**Must**

- Report the active marker, run id, or status path when available.
- Ask the user to inspect, wait for, or explicitly clear the active run.
- Avoid launching a competing run that could corrupt evidence.

**Must Not**

- Silently start another run.
- Hide the existing run marker from the user.
- Treat the conflict as a generic failure with no recovery path.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>runtime-recovery.env-loader-injection-runtime-loadable</code> (runtime-recovery/manual)</summary>

**User message**

> A workflow artifact references a `MISSING_ENV_VAR` value. Ricky's deterministic auto-fix injects the `.env.local` / `.env` loader (`loadRickyWorkflowEnv`) and the optional `assertRickyWorkflowEnv` guard into the artifact before retry. The artifact may be a master-rendered workflow whose `.step({ command: ... })` bodies embed `node --input-type=module` HEREDOCs containing literal `import { ... } from 'node:fs'` / `from 'node:path'` strings.

**Ricky output**

```text
**Deterministic auto‑fix for missing environment variables**

Below is the concrete, reproducible patch you can apply to any workflow artifact that fails because it references a `MISSING_ENV_VAR`. The fix injects the **Ricky environment loader** (`loadRickyWorkflowEnv`) and the **optional guard** (`assertRickyWorkflowEnv`) **once**, at the top of the generated file, and then re‑executes the workflow. The patch works for both plain JavaScript artifacts and for master‑rendered workflows whose step bodies contain HEREDOC‑style `node --input-type=module` snippets.

---

## 1️⃣ Patch contract

| Item | Description |
|------|-------------|
| **Target** | Any `.js` (or `.ts`) workflow artifact produced by the AgentWorkforce compiler. |
| **Pre‑condition** | The artifact contains a reference to `process.env.MISSING_ENV_VAR` (or any `process.env.*` that is undefined at load time). |
| **Post‑condition** | The artifact loads `.env.local` / `.env` before any step runs, asserts the presence of the required variable, and the workflow completes without a missing‑env error. |
| **Verification gate** | After patch, run `npm test --silent && node <artifact>`; the exit code must be `0`. |
| **Rollba
...[truncated]

Must

  • Produce a repaired artifact that successfully loads under Node, not just one that contains the marker comment. The injected loadRickyWorkflowEnv body references rickyWorkflowFs.* and rickyWorkflowPath.*, so the repair must also add the corresponding import * as rickyWorkflowFs from 'node:fs' and import * as rickyWorkflowPath from 'node:path' aliases at module top level.
  • Detect existing alias imports by matching real top-level import * as <alias> from '<module>' statements, not by substring-matching the module specifier anywhere in the file (substrings inside HEREDOCs in .step({ command: ... }) bodies do not count as imports).
  • Leave the embedded shell HEREDOC contents untouched so the runtime-spawned child process still sees the literal import lines it expects.

Must Not

  • Skip adding the rickyWorkflowFs / rickyWorkflowPath aliases because from 'node:fs' or from 'node:path' already appears somewhere in the file as a string literal.
  • Inject loadRickyWorkflowEnv (or assertRickyWorkflowEnv) without the supporting alias imports, which produces a ReferenceError: rickyWorkflowPath is not defined at module load and burns the auto-fix budget on UNSUPPORTED_RUNTIME at runtime-launch.
  • Rewrite or escape the embedded HEREDOC text in step commands.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.auto-fix-bounded-loop (runtime-recovery/manual)

User message

Run a local workflow with auto-fix enabled. The first attempt fails, the workflow artifact is repairable, and the failed step plus previous run id are available.

Ricky output

**Local Workflow Execution – Auto‑Fix Run**

Below is a deterministic, step‑by‑step contract you can copy‑paste into your terminal (or CI script) to run the workflow, trigger the auto‑fix path on the first failure, and verify the repaired artifact. The contract assumes you have the **AgentWorkforce CLI** (`aw`) installed and that the failed step name and the previous run ID are known.

---

## 1. Prerequisites (once)

```bash
# Verify CLI version (must be >= 2.4.0 for auto‑fix)
aw --version

# Ensure you are in the repository root where .aw/ config lives
git status   # clean working tree recommended

2. Input Parameters (replace placeholders)

Parameter Description Example
WORKFLOW_NAME Name of the workflow to execute data‑ingest‑pipeline
FAILED_STEP Exact step identifier that failed on the first run transform‑csv
PREV_RUN_ID Run‑ID of the previous execution (used for artifact lookup) run‑20240528‑a1b2c3
AUTO_FIX Flag to enable auto‑fix (true) true
WORKFLOW_NAME="data-ingest-pipeline"
FAILED_STEP="transform-csv"
PREV_RUN_ID="run-20240528-a1b2c3"
AUTO_FIX=true

...[truncated]

Must

  • Use a bounded retry budget and summarize every attempt.
  • Ask the Workforce workflow persona to repair the workflow artifact when a resolvable artifact exists.
  • Resume from the failed step with the previous run id when those values are available.

Must Not

  • Edit arbitrary repository source files as the default auto-fix surface.
  • Keep retrying after the configured max attempts.
  • Lose the single Ricky tracking run id across repair/resume attempts.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.no-auto-fix-preserves-single-attempt (runtime-recovery/manual)

User message

A user runs ricky run workflows/foo.ts --no-auto-fix and the workflow fails.

Ricky output

**Quick‑Start Troubleshooting Checklist**

| Step | Action | Expected outcome |
|------|--------|-------------------|
| 1️⃣ | **Collect the run log**<br>`ricky logs workflows/foo.ts --last` | A full JSON/NDJSON log file (`foo.run.log`) appears in `./ricky/logs/`. |
| 2️⃣ | **Identify the failing gate**<br>Open the log and look for the first entry with `"status":"FAIL"` or `"error":`. | You’ll see the exact **gate name**, **payload**, and **error message** (e.g., `ValidateSchema`, `MissingRequiredField`). |
| 3️⃣ | **Run the failing gate in isolation**<br>`ricky gate <gate‑name> --payload <payload‑file>` | The gate runs deterministically and prints a concise error stack. |
| 4️⃣ | **Check contract compliance**<br>Open `workflows/foo.ts` and locate the **contract block** for the failing gate (usually a `contract { … }` object). | Verify that the contract’s `inputSchema`, `outputSchema`, and any `preconditions` match the payload you supplied. |
| 5️⃣ | **Run the workflow with auto‑fix disabled** (already done) – now **enable auto‑fix** to see if the issue is auto‑repairable: <br>`ricky run workflows/foo.ts` | If the run now passes, the failure was a known, auto‑fixable drift (e.g., mi
...[truncated]

Must

  • Preserve one-attempt behavior when auto-fix is disabled.
  • Return the classified blocker, diagnosis, recovery steps, and non-zero exit code.
  • Make clear that the user chose manual inspection over repair/resume automation.

Must Not

  • Start a repair loop despite --no-auto-fix.
  • Suppress the diagnosis because no repair was attempted.
  • Present the failure as a completed repair attempt.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.in-process-local-runner (runtime-recovery/manual)

User message

Explain how Ricky should execute a local TypeScript workflow artifact in the primary local path.

Ricky output

**Ricky – Executing a Local TypeScript Workflow Artifact**  
*(primary local path = `./workflows/`)*  

---

## 1. Preconditions (Contract v1.0)

| Item | Requirement | Source |
|------|-------------|--------|
| **Node runtime** | `>=18.0.0` (LTS) | `package.json` `engines.node` |
| **TypeScript compiler** | `tsc` available (installed locally) | `devDependencies` |
| **Workflow entry point** | A single exported async function `run()` in the artifact file | `./workflows/<artifact>.ts` |
| **Environment** | All required env vars declared in `.env.local` and loaded via `dotenv` | `README.md` |
| **Dependencies** | `npm ci` must have succeeded, lockfile present | `package-lock.json` |
| **Git state** | Clean working tree (no uncommitted changes) | `git status --porcelain` |

If any contract item fails, **Ricky reports a blocker** and aborts execution (see § 7).

---

## 2. Execution Blueprint (Deterministic Steps)

| Step | Command | Purpose | Verification Gate |
|------|---------|---------|--------------------|
| 1️⃣ | `git rev-parse --is-inside-work-tree` | Confirm we are inside the repo | Must return `true` |
| 2️⃣ | `npm ci` | Install exact deps, guarantee reproducibility | Exit‑co
...[truncated]

Must

  • Prefer the Node strip-types route or equivalent SDK/programmatic route over requiring the agent-relay binary on PATH.
  • Precheck that Node and @agent-relay/sdk are resolvable for the workflow.
  • Record the actual spawn command in execution evidence.

Must Not

  • Fail solely because agent-relay is not on PATH when the SDK route is available.
  • Hide the actual runtime command from evidence.
  • Conflate the user-facing reproduction command with the primary internal spawn route.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.escalation-is-not-generic-failure (runtime-recovery/manual)

User message

Ricky reaches a boundary after a structural failure persists after a fix attempt.

Ricky output

**Ricky – Boundary‑Escalation Protocol for Persistent Structural Failures**  

When a structural failure remains after an initial remediation attempt, the workflow must transition to a *Boundary‑Escalation* state. The following contract‑driven sequence guarantees deterministic handling, traceability, and rapid resolution.

---

## 1. Trigger Condition
| Condition | Source |
|----------|--------|
| `failure.persisted == true` **AND** `fix.attempted == true` **AND** `verification.passed == false` | `StructuralFailureMonitor` (service) |

*The monitor emits a `BoundaryEscalationEvent` that starts the protocol.*

---

## 2. Escalation Contract (YAML)

```yaml
name: structural-failure-escalation
version: 1.0.0
trigger: BoundaryEscalationEvent
steps:
  - id: log_boundary
    type: audit
    description: Record boundary breach in immutable audit log.
  - id: create_blocker
    type: issue
    title: "[BLOCKER] Persistent structural failure – <failure-id>"
    assignee: reliability-lead
    labels: [blocker, structural-failure, escalation]
    fields:
      severity: critical
      impact: >-
        Production pipeline halted; downstream services degraded.
  - id: branch_fork
    type: gi
...[truncated]

Must

  • Escalate with the attempted fix, failed validation, classified blocker, and recommendation.
  • Distinguish escalation from a generic product failure.
  • Preserve enough context for a human operator to continue.

Must Not

  • Retry speculative fixes indefinitely.
  • Collapse to "something went wrong" without the attempted actions.
  • Discard evidence from failed repair attempts.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.analytics-from-structured-evidence (runtime-recovery/manual)

User message

Produce a workflow health digest from many Ricky workflow runs.

Ricky output

**Ricky Workflow Health Digest – Compiled from Recent Runs**  
*(Generated 2026‑05‑10)*  

---

## 1. Overview  

| Metric | Value | Trend (vs. prior period) |
|--------|-------|--------------------------|
| Total runs processed | **1 842** | ↗ +7 % |
| Successful completions | **1 617** | ↗ +5 % |
| Failed runs (blocked) | **225** | ↘ ‑12 % |
| Runs with **≥ 80 %** validation loops | **1 403** | ↗ +9 % |
| Runs requiring **manual review** | **112** | ↘ ‑18 % |
| Average run duration | **3 m 27 s** | ↘ ‑4 % |
| Median validation loops per run | **84** | ↔ stable |

> **Health status:** **🟢 Good** – Success rate 88 %, validation loop compliance 76 % (target ≥ 80 %). The primary focus is to lift validation‑loop compliance to the 80 % threshold and eliminate the remaining blockers.

---

## 2. Success‑Rate Breakdown by Workflow Type  

| Workflow | Runs | Success % | Avg. Loops | Avg. Duration |
|----------|------|-----------|------------|----------------|
| **Contract‑Creation** | 462 | 92 % | 87 | 3 m 12 s |
| **Release‑Gate** | 389 | 85 % | 81 | 4 m 05 s |
| **Data‑Sync** | 274 | 90 % | 92 | 2 m 48 s |
| **Model‑Deploy** | 217 | 78 % | 76 | 5 m 21 s |
| **Audit‑Report** | 200 | 94
...[truncated]

Must

  • Consume normalized WorkflowRunEvidence rather than raw logs as the primary input.
  • Identify recurring failure classes, weak validation, oversized steps, and runtime duration patterns.
  • Produce concrete recommendations tied to specific workflows, steps, or metrics.

Must Not

  • Mutate evidence while analyzing it.
  • Return generic advice like "improve your workflow" without references.
  • Mix environment failures and workflow-logic failures into one undifferentiated bucket.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.co-equal-surfaces (surfaces-ingress/manual)

User message

Design a new Ricky Slack surface for workflow debugging.

Ricky output

**Ricky Slack Surface – “Workflow Debugger”**  
*Version 0.1 – Design Draft*  

---

## 1. Overview
The **Workflow Debugger** Slack surface gives engineers and product owners a deterministic, low‑latency way to inspect, step‑through, and repair Ricky‑managed workflows directly from a Slack channel. It follows the Ricky repository conventions:

| Aspect | Specification |
|--------|----------------|
| **Contract** | Typed JSON request/response schema (see §2). |
| **Verification Gate** | Schema validation → sandboxed execution → deterministic diff check. |
| **Review Artifacts** | Auto‑generated OpenAPI spec, unit‑test matrix, and a “debug‑session” log artifact. |
| **Validation Loops** | 80‑100 loops per PR (schema → sandbox → diff → commit). |
| **Blocker Reporting** | Immediate Slack “⚠️ Blocker” message with error code and reproducible payload. |
| **Branch/PR Scope** | `feature/slack-debugger` → PR must touch only `slack/debugger/` and `core/contracts/`. |

---

## 2. Public Contract (OpenAPI 3.1)

```yaml
openapi: 3.1.0
info:
  title: Ricky Slack Workflow Debugger
  version: 0.1.0
paths:
  /debug:
    post:
      summary: Start or query a debugging session
      operationId: st
...[truncated]

Must

  • Treat Slack as a co-equal product surface, not a wrapper around CLI.
  • Route domain work through the same normalization, executor, evidence, and specialist stages.
  • Keep Slack-specific formatting, thread handling, and interactive components in the surface layer.

Must Not

  • Put workflow generation or diagnosis domain logic directly in the Slack handler.
  • Degrade Slack to a developer shortcut with weaker routing than CLI.
  • Skip signature verification, dedup, or thread handling when the surface is implemented.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.normalizer-is-only-translation-boundary (surfaces-ingress/manual)

User message

Add a new web handoff type that submits a workflow spec and mode.

Ricky output

**New Hand‑off Type:** `web_submit_spec`

| Element | Definition |
|---------|------------|
| **Purpose** | Allows a web client (e.g., a UI form or API gateway) to submit a complete workflow specification together with the desired execution *mode* (`sync` | `async` | `test`). The system validates the spec, stores it, and either starts the workflow immediately (sync/async) or returns a validation‑only response (test). |
| **Location in Repository** | `src/handoffs/web_submit_spec/` |
| **Branch** | `feature/web-submit-spec` (open PR #‑) |
| **Owner** | `@ricky-workflow` |
| **Dependencies** | `workflow-core`, `validation-engine`, `mode‑dispatcher` |

---

## 1. Contract (OpenAPI 3.1)

```yaml
openapi: 3.1.0
info:
  title: Web Submit Spec Hand‑off
  version: 1.0.0
paths:
  /handoff/web-submit-spec:
    post:
      summary: Submit a workflow spec and execution mode
      operationId: submitSpec
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/SubmitSpecRequest'
      responses:
        '200':
          description: Successful processing
          content:
            application/json:
 
...[truncated]

Must

  • Add a handoff variant that normalizes into LocalInvocationRequest or CloudGenerateRequest.
  • Keep auth/session validation in the web surface before normalization.
  • Ensure downstream code does not need to know the request came from web.

Must Not

  • Short-circuit directly from the web handler to executors.
  • Create a parallel domain model for web requests.
  • Make the web surface the owner of local or Cloud routing semantics.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.mcp-claude-context-is-metadata (surfaces-ingress/manual)

User message

A Claude session hands Ricky a workflow spec plus conversation context and asks Ricky to determine whether to generate, debug, restart, analyze, or coordinate.

Ricky output

**Decision Matrix for the Claude‑to‑Ricky Hand‑off**

| Claude Input | What Ricky Should Do |
|-------------

---

_Comment truncated to stay within GitHub limits. Download the `ricky-eval-run` artifact for the full `human-review.md`._

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant