pb-spec is a set of Agent Skills Specification compliant AI Coding assistant workflow skill packages. It provides a structured process — plan → build — that turns natural-language requirements into well-architected, BDD-driven, tested code.
pb-spec implements the Plan-Build pattern: a Planner Agent generates design specs, and a Builder Agent executes code with verification. The core innovation is making .feature files the absolute source of truth — all design and tasks derive FROM scenarios, not the other way around.
| Principle | Description |
|---|---|
| BDD-First | Feature scenarios are the source of truth. Design and tasks derive FROM scenarios. |
| RFC 2119 Constraints | Architectural constraints use MUST/SHOULD/MAY keywords — binding for the Builder. |
| DAG Execution | Tasks include DependsOn metadata for parallel execution of independent tasks. |
| Adaptive Steering | Tasks with Complexity=High route to reasoning models; Low to fast models. |
| Escalation Protocol | Repeated failures auto-escalate to stronger models for root-cause analysis. |
| Generator/Evaluator Isolation | Generator builds; Evaluator judges with fresh context — never inherits Generator state. |
design.md artifacts conform to industry-standard specification formats:
| Standard | Purpose | pb-spec Application |
|---|---|---|
| EARS Notation | Eliminate ambiguous requirements with 5 sentence patterns | Every requirement uses EARS syntax with [REQ-XX] IDs |
| C4 Model + Mermaid | Architecture topology in parseable text | Architecture sections use ```mermaid blocks |
| DBML / Prisma Schema | Structured data models with strict types | Data model sections use DBML or Prisma Schema DSL |
| MADR (ADR Records) | Architecture decision records | Every AD has [Context], [Decision], [Consequences] |
| RFC 2119 Constraints | Binding behavioral constraints for agents | §Architectural Constraints with MUST/SHOULD/MAY |
| Behavior Traceability Matrix | Every component maps to a Feature scenario | No scenario = remove from design |
| Source | Core Idea | How pb-spec Applies It |
|---|---|---|
| RPI Strategy | Separate research, planning, and implementation | /pb-init + /pb-plan precede /pb-build |
| Plan-and-Solve Prompting | Plan first to reduce missing-step errors | design.md + tasks.md are mandatory artifacts |
| ReAct | Interleave reasoning and actions with environment feedback | /pb-build executes task-by-task with test/tool feedback loops |
| Reflexion | Learn from failure signals via iterative retries | Escalation protocol + DCR flow in pb-build |
| Harness Engineering (OpenAI, 2026-02-11) | Treat runtime signals and checklists as first-class harness inputs | pb-plan requires runtime verification hooks; pb-build validates logs/health evidence |
| openai/symphony | Long-running agents need explicit observability and deterministic escalation | pb-build enforces bounded retries and emits standardized DCR packets |
| Effective Harnesses for Long-Running Agents | Grounding, context hygiene, recovery, observability | State checks, minimal context handoff, task-local rollback |
| Building Effective Agents | Prefer simple composable workflows over framework complexity | Small adapter-based CLI + explicit workflow prompts |
| Harness Design for Long-Running Application Development | Generator/Evaluator separation; adversarial evaluation | pb-build dual-persona with adaptive evaluation by task complexity |
| shadcn/improve | Audit codebase, write self-contained plans for cheaper executors | /pb-improve surveys codebase, produces prioritized specs |
| Agent-SOP | RFC 2119 constraints, DAG tasks, adaptive model routing | RFC 2119 in design.md, DAG metadata in tasks.md, Escalation protocol |
| Superpowers | Composable skills, evidence-based claims, systematic debugging | using-pb-spec bootstrap; supporting skills ecosystem |
No manual file configuration needed. As long as your AI assistant supports the standard Agent Skills specification (Claude Code, Cursor, GitHub Copilot, OpenCode, etc.), you can install with one command.
Run in your project root:
# Install all pb-spec workflow skills at once
npx skills add longcipher/pb-spec
# Or install only the ones you need
npx skills add longcipher/pb-spec --skill pb-init
npx skills add longcipher/pb-spec --skill pb-plan
npx skills add longcipher/pb-spec --skill pb-build
npx skills add longcipher/pb-spec --skill pb-improve(After installation, skills will be placed in .agents/skills/ or the compatible local directory for your environment, and automatically indexed by your AI.)
Invoke these skills directly in your IDE / Agent terminal:
- /pb-init: Audit the current project and non-destructively create or update the
AGENTS.mdarchitecture snapshot. - /pb-plan "requirement description": For example
/pb-plan Add WebSocket auth, AI will generatedesign.md,tasks.md, and.featuretest specs with Architecture Decisions and RFC 2119 constraints. - /pb-build "feature-name": Automatically reads
tasks.md, starts the outer BDD and inner TDD dual-loop with Subagents, closing and verifying each task one by one. - /pb-refine "feature-name": (Optional) When encountering design blocks (Build Blocked) or architectural flaws, iteratively modify specs based on feedback.
- /pb-improve: Audit the codebase and generate prioritized implementation specs for other agents to execute.
| Skill | Trigger | Output | Description |
|---|---|---|---|
pb-init |
/pb-init |
AGENTS.md |
Audit repo and safely update/append a managed snapshot block without rewriting user-authored constraints |
pb-plan |
/pb-plan <requirement> |
specs/<spec-dir>/design.md + tasks.md + features/*.feature |
Design proposal + Gherkin scenarios + ordered task breakdown with RFC 2119 constraints and DAG metadata |
pb-refine |
/pb-refine <feature> |
Revised spec files | Apply feedback or Design Change Requests |
pb-build |
/pb-build <feature-name> |
Code + tests | BDD+TDD via Generator (builds) + Evaluator (adversarial review) with Escalation protocol |
pb-improve |
/pb-improve |
specs/<spec-dir>/ + specs/context.md |
Codebase audit → prioritized findings → pb-plan-compatible specs |
| Skill | Trigger | Description |
|---|---|---|
using-pb-spec |
Session start | Bootstrap: establishes skill invocation discipline |
pb-brainstorming |
Before creative work | Explores intent, requirements, and design before implementation |
pb-test-driven-development |
Before writing code | Red → Green → Refactor cycle, non-negotiable TDD |
pb-systematic-debugging |
Any bug or failure | Find root cause before attempting fixes |
pb-verification-before-completion |
Before claiming done | Evidence before assertions, always |
pb-requesting-code-review |
Before merge | Dispatch independent reviewer subagent |
pb-receiving-code-review |
Processing feedback | Technical evaluation, not performative agreement |
pb-dispatching-parallel-agents |
Multiple independent tasks | One agent per problem domain, concurrent execution |
pb-subagent-driven-development |
Executing plans | Fresh context per task, two-stage review |
pb-finishing-a-development-branch |
Work complete | Verify tests → Present options → Execute choice |
pb-writing-skills |
Creating/editing skills | Skills are code, not prose — test them |
This skill library uses the standard SKILL.md specification. Fully compatible with:
- Cursor
- Claude Code
- GitHub Copilot / GitHub Spark
- OpenCode
- Gemini CLI & Codex
- Any tool that supports
agentskills.iospecification or reads.agents/skills/.
Five agent skills that chain together:
/pb-init → /pb-plan → [/pb-refine] → /pb-build
↘
/pb-improve → specs/ → /pb-build
Supporting skills activate automatically during the workflow:
pb-brainstorming— before/pb-planwhen requirements are unclearpb-test-driven-development— during/pb-buildfor every taskpb-systematic-debugging— when tasks fail repeatedlypb-verification-before-completion— before marking any task DONEpb-requesting-code-review/pb-receiving-code-review— review cyclespb-dispatching-parallel-agents— parallel audit in/pb-improvepb-finishing-a-development-branch— finalization after/pb-build
Audits your project and writes a pb-init snapshot into AGENTS.md using managed markers:
<!-- BEGIN PB-INIT MANAGED BLOCK --><!-- END PB-INIT MANAGED BLOCK -->
Merge behavior is non-destructive:
- If markers exist, only that managed block is replaced.
- If markers do not exist, the managed block is appended.
- All existing content outside the managed block is preserved verbatim.
The managed snapshot includes an Architecture Decision Snapshot so later agents inherit repo-level conventions instead of re-deciding them every run.
Takes source material in arbitrary format and produces a complete feature spec:
specs/<YYYY-MM-DD-NO-feature-name>/
├── design.md # Architecture, API contracts, data models, RFC 2119 constraints
├── tasks.md # Ordered implementation tasks with DAG metadata
└── features/ # Gherkin acceptance artifacts (Source of Truth)
Key capabilities:
- BDD-First: Feature scenarios are written FIRST; design and tasks derive FROM scenarios
- EARS Requirements: All acceptance criteria use 5 sentence patterns for machine-checkable verification
- RFC 2119 Constraints:
§Architectural Constraintssection with MUST/SHOULD/MAY keywords — binding for Builder - Behavior Traceability Matrix: Every design component maps to a Feature scenario (no scenario = remove from design)
- C4 + Mermaid Architecture: Architecture diagrams in parseable Mermaid syntax
- MADR Decisions: Architecture decisions with Context/Decision/Consequences
- DBML Data Models: Structured data models in DSL (natural language forbidden)
- API-First Contracts: Type signatures before implementation (OpenAPI, Protocol, Trait)
- DAG-Enabled Tasks:
TaskID,DependsOn,Complexity,Required Skills,EvalRulemetadata - Risk-based testing: Property tests by default for broad input domains; fuzzing and benchmarks conditional
- Template identity alignment: Renames generic scaffold names to project-matching identifiers
- Source requirement normalization: Converts arbitrary-format input into a structured requirement ledger
- Self-reconciliation: Verifies all requirements are covered across design, tasks, and scenarios before finalizing
Reads user feedback or Design Change Requests and updates design.md and tasks.md. Maintains a revision history and cascades changes without overwriting completed work.
Validates 🛑 Build Blocked and 🔄 Design Change Request packets for required sections before modifying spec artifacts.
Implements tasks sequentially using a Generator/Evaluator dual-persona workflow with Escalation protocol:
Generator (subagent) → READY_FOR_EVAL → Evaluator (independent context) → PASS / FAIL
├── Diff Audit (git diff + scope + architecture)
├── BDD Evidence Verification (independent re-run)
├── MCP Live Verification (Playwright / HTTP / CLI)
└── Edge Case Probing (boundaries, errors, security)
On PASS → Orchestrator marks task DONE in tasks.md
On FAIL → Evaluator feedback → fresh Generator subagent → retry loop
On 2nd FAIL → Escalation: auto-upgrade to stronger model for root-cause analysis
On 3rd FAIL → DCR packet to /pb-refine
Key principles:
- BDD-First: Feature scenarios are the source of truth; all business code must satisfy scenarios
- RFC 2119 Constraints: All constraints from
design.md§Architectural Constraints are BINDING - TDD is non-negotiable: Every task starts with a failing test (Red → Green → Refactor)
- Fresh context per subagent: No inherited assumptions; Evaluator never inherits Generator context
- Architecture decisions are binding: Executes the approved design; does not invent a different architecture
- Escalation over thrashing: 2nd failure auto-escalates to stronger model; 3rd failure → DCR packet
- Mode Behavior: Interactive mode (default) or Auto mode (
--autoflag)
Audits any codebase and writes pb-plan-compatible specs for other agents to execute. The skill never modifies source code — only produces specs under specs/.
/pb-improve full audit → prioritized findings → specs
/pb-improve quick cheap pass: hotspots, top findings only
/pb-improve deep exhaustive: every package, every category
/pb-improve security focused audit (also: perf, tests, bugs, ...)
/pb-improve branch audit only what the current branch changes
/pb-improve next feature suggestions — where to take the project
/pb-improve plan <description> skip the audit, spec one thing
/pb-improve review-spec <feature> critique and tighten an existing spec
/pb-improve reconcile refresh the backlog: verify, unblock, retire
/pb-improve ... --issues also publish specs as GitHub issues
How it works:
- Recon — Maps the repo: stack, conventions, build/test/lint commands (verification gates). Generates
specs/context.mdwith project context. - Audit — Fans out parallel subagents across 9 categories: correctness, security, performance, test coverage, tech debt, dependencies, DX, docs, direction.
- Vet — Re-reads every cited location to drop false positives and correct mis-attributions.
- Prioritize — Findings ordered by leverage (impact ÷ effort, weighted by confidence).
- Spec — One spec per selected finding in
specs/<spec-dir>/with design.md (RFC 2119 constraints), tasks.md (DAG metadata), and features/*.feature (Source of Truth).
The core innovation: .feature files are the absolute source of truth. All design and tasks derive FROM scenarios.
features/*.feature (Source of Truth)
↓
design.md (derives FROM features)
↓
tasks.md (driven BY scenarios)
↓
/pb-build (executes with RED→GREEN→REFACTOR)
Every design component MUST map to a Feature scenario:
| Domain Module | Core Component | Driven by Feature | BDD Tags |
|---|---|---|---|
| Auth | JwtAuthGuard |
features/auth/login.feature |
@auth, @security |
| Payment | StripeWebhookHandler |
features/billing/checkout.feature |
@billing, @webhook |
Rule: If a design component cannot be mapped to a scenario, remove it from the design.
Design constraints use RFC 2119 keywords — binding for the Builder:
## Architectural Constraints (RFC 2119)
- **[C-01]** Database interactions **MUST** use the existing ORM layer; raw SQL **MUST NOT** be introduced.
- **[C-02]** API responses **SHOULD** maintain <200ms p99 latency.
- **[C-03]** If an unhandled edge case is encountered, the Builder **MUST** halt and file a DCR.Tasks include metadata for parallel execution and adaptive model routing:
### Task 2.1: "Successful login" — User authenticates
- **TaskID:** `T1`
- **DependsOn:** `None`
- **Complexity:** `High`
- **Required Skills:** Python, JWT, SQLAlchemy
- **EvalRule:** `behave --tags=@login_success` must pass| Field | Purpose |
|---|---|
TaskID |
Unique identifier for DAG resolution |
DependsOn |
Lists prerequisite TaskIDs; None = can run in parallel |
Complexity |
Low → fast model, High → reasoning model |
Required Skills |
Skills the Builder Agent needs |
EvalRule |
Explicit pass/fail criteria |
| Failure Count | Action | Model Strategy |
|---|---|---|
| 1 | Retry with same model | Same model |
| 2 | Escalate — auto-upgrade to stronger model for root-cause analysis | +1 tier (Haiku→Sonnet, Sonnet→Opus) |
| 3 | File DCR, stop build | N/A |
pb-spec's design is inspired by Anthropic's research on Effective Harnesses for Long-Running Agents and the Superpowers methodology. Core idea: place AI agents inside a strict, observable, recoverable execution environment.
| Principle | How pb-spec Implements It |
|---|---|
| State Grounding | Subagents verify workspace state before writing code |
| Architecture Continuity | pb-init records Architecture Decisions; pb-build verifies conformance |
| Error Quoting | Subagents quote specific error messages before attempting fixes |
| Context Hygiene | Only minimal, relevant context passed to each subagent |
| Recovery Loop | Pre-task snapshots + file-scoped recovery |
| Verification Harness | Design docs define explicit verification commands |
| Observability as Context | Task verification includes runtime signals (logs/health) |
| Escalation Loop | 2nd failure auto-escalates; 3rd failure → DCR handoff to pb-refine |
| Generator/Evaluator Isolation | Generator builds; Evaluator judges with fresh context |
| Evidence Before Claims | pb-verification-before-completion: no success claims without fresh verification |
| RFC 2119 Constraints | Binding behavioral constraints prevent hallucination and scope creep |
| DAG Execution | Parallel task execution for independent tasks |
| Adaptive Steering | Complexity-based model routing for cost/speed optimization |
| Systematic Debugging | pb-systematic-debugging: root cause before fixes, scientific method |
| Skill Auto-Triggering | using-pb-spec bootstrap ensures skills activate at the right moments |