Skip to content

Add --dry-run flag to agentworkforce agent#57

Merged
willwashburn merged 1 commit intomainfrom
worktree-persona-validation
May 8, 2026
Merged

Add --dry-run flag to agentworkforce agent#57
willwashburn merged 1 commit intomainfrom
worktree-persona-validation

Conversation

@willwashburn
Copy link
Copy Markdown
Member

Summary

  • New agentworkforce agent <persona>@<tier> --dry-run flag validates a persona without spawning the harness or burning tier-model tokens. Three checks, fast-fail in order: (1) sidecar resolution; (2) buildInteractiveSpec shape; (3) real skill install in a temp dir.
  • Persona-maker's agentsMdContent now requires running the dry-run before handoff. The output contract gained a line for the dry-run outcome.
  • Closes the gap that lets a persona ship with a hallucinated skill name (source repo exists but the named skill is not in it). Today that bricks every launch with … failed (exit 1). Aborting.; the author has no way to catch it short of attempting a real launch.

Behavior

$ agentworkforce agent capability-discoverer@minimum --dry-run
→ capability-discoverer [minimum] via opencode (opencode/minimax-m2.5-free) [DRY-RUN]
✓ sidecar: (none)
✓ harness spec: opencode (2 args)
• temp dir: /var/folders/.../agentworkforce-dryrun-capability-discoverer-XXXX
• installing 2 skill(s)
• skill.sh/find-skills (skill.sh) → https://github.com/vercel-labs/skills#find-skills
  ✓ skill.sh/find-skills
• prpm/self-improving (prpm) → @prpm/self-improving
  ✓ prpm/self-improving
✓ dry-run ok: 2 skill(s) installed cleanly

On failure, exit is non-zero, the temp dir is kept and printed for inspection, and each failing skill is listed by id + exit code. On success the temp dir is removed.

What's deliberately out of scope (Tier 2/3 — not in this PR)

  • Harness binary check (claude / codex / opencode on PATH) — environment-dependent; author's machine doesn't prove the consumer's. Could land behind a --check-harness flag later.
  • MCP env-ref strict moderesolveMcpServersLenient already drops servers whose ${VAR} refs are unset and warns. Same author/consumer-environment caveat. Could land behind --strict.
  • Mount setup / live MCP connectivity / actual harness boot — runtime concerns, not authoring. Skipped.

Test plan

  • parseAgentArgs accepts --dry-run (new tests; 126/126 CLI tests pass)
  • corepack pnpm run check green across the workspace
  • Smoke: agent debugger@minimum --dry-run✓ dry-run ok (no skills)
  • Smoke: agent capability-discoverer@minimum --dry-run → installs skill.sh + prpm skills cleanly in a temp dir, exits 0
  • Smoke: synthetic persona with hallucinated nextjs-anti-patterns skill → ✗ dry-run failed, exit 1, temp dir preserved
  • Manual: confirm persona-maker agent runs the dry-run during a real authoring session and surfaces failures back to the user

Related

  • Filed AgentWorkforce/relay#824 to track moving skills install handling into @agent-relay/sdk — orthogonal to this PR but the same authoring-time-validation goal.

🤖 Generated with Claude Code

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 8, 2026

Review Change Stack
No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 6362f370-87d0-4f2d-86d5-867c32eee3a2

📥 Commits

Reviewing files that changed from the base of the PR and between ca4c7aa and dfef3a7.

⛔ Files ignored due to path filters (1)
  • packages/workload-router/src/generated/personas.ts is excluded by !**/generated/**
📒 Files selected for processing (3)
  • packages/cli/src/cli.test.ts
  • packages/cli/src/cli.ts
  • personas/persona-maker.json
✅ Files skipped from review due to trivial changes (1)
  • packages/cli/src/cli.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • personas/persona-maker.json

📝 Walkthrough

Walkthrough

This PR adds a new --dry-run flag to the CLI that validates a selected persona without launching the interactive harness. The feature resolves persona inputs, checks sidecar readability, builds the harness spec, tests skill installers in a temporary directory, and reports per-skill outcomes with optional cleanup.

Changes

Dry-run CLI validation

Layer / File(s) Summary
Data contract
packages/cli/src/cli.ts
AgentFlags interface extended with dryRun: boolean field.
Argument parsing
packages/cli/src/cli.ts
Adds mkdtempSync import; initializes dryRun: false defaults in parseAgentArgs and parseCreateArgs; extends both parsers to recognize and set --dry-run flag.
Dry-run validation
packages/cli/src/cli.ts
New runDryRun(selection) function resolves persona inputs and sidecars, validates sidecar readability, builds interactive harness spec, executes each skill installer in a fresh temp directory, reports per-skill pass/fail, and deletes temp dir on success or retains on failure.
CLI integration
packages/cli/src/cli.ts
Updates agent help text to document --dry-run checks and temp-dir behavior; adds early branch in runAgentSelector to call runDryRun when flags.dryRun is set and exit with its status code; updates a create-path call site to pass dryRun: false.
Test coverage
packages/cli/src/cli.test.ts
Updates existing "no flags" test to assert dryRun: false default; adds new test for --dry-run flag setting and positional arg preservation; adds new test for --dry-run composing with --install-in-repo.
Specification
personas/persona-maker.json
Updates agentsMdContent anti-goals and output contract to document dry-run command invocation and outcome reporting requirements, including failing skill identifiers.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

🐰 A dry run before the leap so high,
Skills spin up in a tempdir nest,
Each installer proves its little try,
Successes cleaned, failures rest,
A careful hop before the quest.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 28.57% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Add --dry-run flag to agentworkforce agent' directly and accurately describes the main change: adding a new CLI flag feature.
Description check ✅ Passed The description is comprehensively related to the changeset, detailing the dry-run feature implementation, behavior, test coverage, and scope boundaries.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch worktree-persona-validation

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 4 additional findings.

Open in Devin Review

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/cli/src/cli.ts`:
- Around line 925-945: The current dry-run uses one shared tempDir for all
installs (tempDir, loop over plan.installs), causing state from one skill to
affect others; change the loop in which you destructure inst.installCommand and
call spawnSync/subprocessExitCode so that for each inst you create a unique
subdirectory (e.g., join(tempDir, inst.skillId) or a per-install mkdtemp) and
run spawnSync with cwd set to that per-skill dir, ensuring you still keep the
top-level preserved dry-run root but isolate each installer to avoid
cross-contamination and preserve per-skill artifacts for inspection.

In `@personas/persona-maker.json`:
- Line 29: Update the dry-run validation rules so authors must run
agentworkforce agent <id>@<tier> --dry-run from a resolvable source directory
(either TARGET_DIR that matches the persona cascade or by adding the persona to
the cascade first) and must run the dry-run for every tier that has any
tier-specific runtime config (not just differing skills[])—specifically detect
per-tier overrides of claudeMd/agentsMd sidecars, permissions, mcpServers, or
harnessSettings and require a dry-run for each such tier; if CREATE_MODE is
built-in, require the author to append the intent to PERSONA_INTENTS and add the
persona to the workspace before running the tiered dry-runs so the agent CLI can
resolve <id>@<tier>.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 812bc13c-d4ec-4519-b0cf-8154611e575a

📥 Commits

Reviewing files that changed from the base of the PR and between 970787d and ca4c7aa.

⛔ Files ignored due to path filters (1)
  • packages/workload-router/src/generated/personas.ts is excluded by !**/generated/**
📒 Files selected for processing (3)
  • packages/cli/src/cli.test.ts
  • packages/cli/src/cli.ts
  • personas/persona-maker.json

Comment thread packages/cli/src/cli.ts
Comment on lines +925 to +945
const tempDir = mkdtempSync(join(tmpdir(), `agentworkforce-dryrun-${personaId}-`));
process.stderr.write(`• temp dir: ${tempDir}\n`);
process.stderr.write(`• installing ${plan.installs.length} skill(s)\n`);

const failures: { skillId: string; exitCode: number }[] = [];
for (const inst of plan.installs) {
const [bin, ...args] = inst.installCommand;
if (!bin) {
process.stderr.write(` skipped ${inst.skillId}: empty install command\n`);
continue;
}
process.stderr.write(`• ${inst.skillId} (${inst.sourceKind}) → ${inst.packageRef}\n`);
const res = spawnSync(bin, args, { stdio: 'inherit', shell: false, cwd: tempDir });
const code = subprocessExitCode(res);
if (code === 0) {
process.stderr.write(` ✓ ${inst.skillId}\n`);
} else {
process.stderr.write(` ✗ ${inst.skillId} (exit ${code})\n`);
failures.push({ skillId: inst.skillId, exitCode: code });
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Isolate each skill install instead of reusing one temp workspace.

All installers run in the same tempDir, so a partial failure can leave behind skills-lock.json, .skills/, .agents/, or other state that poisons later installs. That makes the per-skill verdicts unreliable: one bad skill can cascade into multiple false failures.

A fresh subdir per skill under the preserved dry-run root avoids cross-contamination while still leaving artifacts behind for inspection.

Suggested change
   const tempDir = mkdtempSync(join(tmpdir(), `agentworkforce-dryrun-${personaId}-`));
   process.stderr.write(`• temp dir: ${tempDir}\n`);
   process.stderr.write(`• installing ${plan.installs.length} skill(s)\n`);

   const failures: { skillId: string; exitCode: number }[] = [];
   for (const inst of plan.installs) {
+    const skillDir = mkdtempSync(
+      join(tempDir, `${inst.skillId.replace(/[^a-zA-Z0-9._-]/g, '_')}-`)
+    );
     const [bin, ...args] = inst.installCommand;
     if (!bin) {
       process.stderr.write(`  skipped ${inst.skillId}: empty install command\n`);
       continue;
     }
     process.stderr.write(`• ${inst.skillId} (${inst.sourceKind}) → ${inst.packageRef}\n`);
-    const res = spawnSync(bin, args, { stdio: 'inherit', shell: false, cwd: tempDir });
+    const res = spawnSync(bin, args, { stdio: 'inherit', shell: false, cwd: skillDir });
     const code = subprocessExitCode(res);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/cli/src/cli.ts` around lines 925 - 945, The current dry-run uses one
shared tempDir for all installs (tempDir, loop over plan.installs), causing
state from one skill to affect others; change the loop in which you destructure
inst.installCommand and call spawnSync/subprocessExitCode so that for each inst
you create a unique subdirectory (e.g., join(tempDir, inst.skillId) or a
per-install mkdtemp) and run spawnSync with cwd set to that per-skill dir,
ensuring you still keep the top-level preserved dry-run root but isolate each
installer to avoid cross-contamination and preserve per-skill artifacts for
inspection.

Comment thread personas/persona-maker.json Outdated
Validates a persona without spawning the harness or burning tier-model
tokens. Three checks, fast-fail in order:

1. Sidecar resolution — `claudeMd` / `agentsMd` filename refs are
   readable.
2. Harness-spec build — `buildInteractiveSpec` rejects malformed
   `permissions` patterns, `mcpServers` shape errors, and missing
   required harness fields.
3. Skill install — runs every `skills[].source` through its real
   installer (`npx -y skills add` / `npx -y prpm install`) inside a
   fresh temp dir, with per-skill pass/fail. Temp dir is removed on
   success and kept on failure for inspection.

Motivation: a persona authored with a hallucinated skill name (source
repo exists but the named skill is not in it) bricks every launch with
"failed (exit 1). Aborting." The author has no way to catch this short
of attempting a launch. Dry-run gives them a deterministic signal.

Persona-maker's `agentsMdContent` now requires running the dry-run
before handoff and lists its outcome in the output contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@willwashburn willwashburn force-pushed the worktree-persona-validation branch from ca4c7aa to dfef3a7 Compare May 8, 2026 14:41
@willwashburn willwashburn merged commit 8238cc5 into main May 8, 2026
2 checks passed
@willwashburn willwashburn deleted the worktree-persona-validation branch May 8, 2026 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant