Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 54 additions & 0 deletions .github/workflows/workflow-reliability.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
name: Workflow Reliability

on:
pull_request:
branches: [main]
paths:
- '.github/workflows/workflow-reliability.yml'
- 'packages/sdk/src/workflows/**'
- 'packages/sdk/package.json'
- 'packages/workflow-types/**'
- 'package-lock.json'
- 'package.json'
push:
branches: [main]
paths:
- '.github/workflows/workflow-reliability.yml'
- 'packages/sdk/src/workflows/**'
- 'packages/sdk/package.json'
- 'packages/workflow-types/**'
- 'package-lock.json'
- 'package.json'

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true

jobs:
workflow-reliability:
name: SDK Workflow Reliability
runs-on: ubuntu-latest
env:
NPM_CONFIG_FUND: false

steps:
- name: Checkout repository
uses: actions/checkout@v4

- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: '22'
cache: 'npm'

- name: Install dependencies
run: npm ci

- name: Typecheck SDK workflows
run: npm --prefix packages/sdk run check

- name: Run workflow reliability contract matrix
run: |
npx vitest run --root packages/sdk --config vitest.config.ts \
src/workflows/__tests__/workflow-reliability-contract.test.ts \
src/workflows/__tests__/workflow-reliability-e2e.test.ts
25 changes: 25 additions & 0 deletions .trajectories/completed/2026-05/traj_34b1u84b19gz.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
{
"id": "traj_34b1u84b19gz",
"version": 1,
"task": {
"title": "Address PR 827 review feedback"
},
"status": "completed",
"startedAt": "2026-05-08T18:29:34.717Z",
"completedAt": "2026-05-08T18:33:55.607Z",
"agents": [],
"chapters": [],
"retrospective": {
"summary": "Addressed PR #827 review feedback: cleaned reliability options type, tightened worktree branch validation, fixed supervised API-owner execution without interactive spawn, removed overlapping CI path filter, fixed E2E helper return shape, and cleaned duplicate trajectory text. Added a targeted supervised API-owner regression test and re-ran SDK typecheck plus reliability suites.",
"approach": "Standard approach",
"confidence": 0.9
},
"commits": [],
"filesChanged": [],
"projectId": "/Users/khaliqgant/Projects/AgentWorkforce/relay-workflow-reliability-defaults",
"tags": [],
"_trace": {
"startRef": "6d4b6969cd96596fea43808e6cddbdd70c029b8d",
"endRef": "6d4b6969cd96596fea43808e6cddbdd70c029b8d"
}
}
14 changes: 14 additions & 0 deletions .trajectories/completed/2026-05/traj_34b1u84b19gz.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Trajectory: Address PR 827 review feedback

> **Status:** ✅ Completed
> **Confidence:** 90%
> **Started:** May 8, 2026 at 08:29 PM
> **Completed:** May 8, 2026 at 08:33 PM

---

## Summary

Addressed PR #827 review feedback: cleaned reliability options type, tightened worktree branch validation, fixed supervised API-owner execution without interactive spawn, removed overlapping CI path filter, fixed E2E helper return shape, and cleaned duplicate trajectory text. Added a targeted supervised API-owner regression test and re-ran SDK typecheck plus reliability suites.

**Approach:** Standard approach
53 changes: 53 additions & 0 deletions .trajectories/completed/2026-05/traj_bdrlknyl8twj.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
{
"id": "traj_bdrlknyl8twj",
"version": 1,
"task": {
"title": "Add workflow reliability defaults and E2E matrix"
},
"status": "completed",
"startedAt": "2026-05-08T17:54:45.069Z",
"completedAt": "2026-05-08T18:05:37.305Z",
"agents": [
{
"name": "default",
"role": "lead",
"joinedAt": "2026-05-08T18:02:02.075Z"
}
],
"chapters": [
{
"id": "chap_sqrkpwofov15",
"title": "Work",
"agentName": "default",
"startedAt": "2026-05-08T18:02:02.075Z",
"endedAt": "2026-05-08T18:05:37.305Z",
"events": [
{
"ts": 1778263322077,
"type": "decision",
"content": "Made retry-mode workflows repair-aware by default",
"raw": {
"question": "Made retry-mode workflows repair-aware by default",
"chosen": "Made retry-mode workflows repair-aware by default",
"alternatives": [],
"reasoning": "Workflow reliability is now a product contract: SDK builder workflows and raw runner configs with agents get bounded repair retries unless callers explicitly choose fail-fast, continue, or repairRetries: 0. Agent/artifact failures now invoke repair before retrying, not only deterministic gates."
},
"significance": "high"
}
]
}
],
"retrospective": {
"summary": "Added Relay workflow reliability defaults, repairable builder presets, agent-step repair before retry, API-agent verification through the normal agent loop, worktree-step validation, a dedicated reliability CI job, and contract/E2E coverage for malformed artifacts, child INVALID_ARTIFACT recovery, deterministic gate repair, fan-out isolation, master-child, worktree-backed, deterministic-only, and agent-plus-gate workflow shapes.",
"approach": "Standard approach",
"confidence": 0.9
},
"commits": [],
"filesChanged": [],
"projectId": "/Users/khaliqgant/Projects/AgentWorkforce/relay-workflow-reliability-defaults",
"tags": [],
"_trace": {
"startRef": "0e536f46028fb008342efc0908342408984b37d0",
"endRef": "0e536f46028fb008342efc0908342408984b37d0"
}
}
31 changes: 31 additions & 0 deletions .trajectories/completed/2026-05/traj_bdrlknyl8twj.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Trajectory: Add workflow reliability defaults and E2E matrix

> **Status:** ✅ Completed
> **Confidence:** 90%
> **Started:** May 8, 2026 at 07:54 PM
> **Completed:** May 8, 2026 at 08:05 PM

---

## Summary

Added Relay workflow reliability defaults, repairable builder presets, agent-step repair before retry, API-agent verification through the normal agent loop, worktree-step validation, a dedicated reliability CI job, and contract/E2E coverage for malformed artifacts, child INVALID_ARTIFACT recovery, deterministic gate repair, fan-out isolation, master-child, worktree-backed, deterministic-only, and agent-plus-gate workflow shapes.

**Approach:** Standard approach

---

## Key Decisions

### Made retry-mode workflows repair-aware by default
- **Chose:** Made retry-mode workflows repair-aware by default
- **Reasoning:** Workflow reliability is now a product contract: SDK builder workflows and raw runner configs with agents get bounded repair retries unless callers explicitly choose fail-fast, continue, or repairRetries: 0. Agent/artifact failures now invoke repair before retrying, not only deterministic gates.

---

## Chapters

### 1. Work
*Agent: default*

- Made retry-mode workflows repair-aware by default
16 changes: 15 additions & 1 deletion .trajectories/index.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"version": 1,
"lastUpdated": "2026-05-08T15:51:38.996Z",
"lastUpdated": "2026-05-08T18:33:55.701Z",
"trajectories": {
"traj_1775914133873_35667beb": {
"title": "fix-sdk-build-resolution-workflow",
Expand Down Expand Up @@ -282,6 +282,20 @@
"startedAt": "2026-05-08T15:50:35.978Z",
"completedAt": "2026-05-08T15:51:38.854Z",
"path": "/Users/khaliqgant/Projects/AgentWorkforce/relay-repairable-workflows/.trajectories/completed/2026-05/traj_vkozdglobkyg.json"
},
"traj_bdrlknyl8twj": {
"title": "Add workflow reliability defaults and E2E matrix",
"status": "completed",
"startedAt": "2026-05-08T17:54:45.069Z",
"completedAt": "2026-05-08T18:05:37.305Z",
"path": "/Users/khaliqgant/Projects/AgentWorkforce/relay-workflow-reliability-defaults/.trajectories/completed/2026-05/traj_bdrlknyl8twj.json"
},
"traj_34b1u84b19gz": {
"title": "Address PR 827 review feedback",
"status": "completed",
"startedAt": "2026-05-08T18:29:34.717Z",
"completedAt": "2026-05-08T18:33:55.607Z",
"path": "/Users/khaliqgant/Projects/AgentWorkforce/relay-workflow-reliability-defaults/.trajectories/completed/2026-05/traj_34b1u84b19gz.json"
}
}
}
2 changes: 1 addition & 1 deletion packages/sdk/src/workflows/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -371,7 +371,7 @@ errorHandling:
notifyChannel: alerts
```

When `errorHandling.strategy: retry` includes an explicit `repairRetries` budget, deterministic step or verification gate failures are treated as repairable work before terminal failure. The runner chooses `errorHandling.repairAgent` when set, otherwise it uses the step's owning/upstream agent when possible, then falls back to the best available workflow agent. The selected agent gets the failed command, working directory, exit information, and captured output, then the deterministic gate is retried.
Retry-mode workflows are repair-aware by default. Deterministic step failures, verification gate failures, and malformed agent artifacts are treated as repairable work before terminal failure. The runner chooses `errorHandling.repairAgent` when set, otherwise it uses the step's owning/upstream agent when possible, then falls back to the best available workflow agent. The selected agent gets the failed command or agent output, working directory, exit information, and captured evidence, then the failed gate or step is retried. Use `repairRetries: 0`, `strategy: fail-fast`, or `strategy: continue` when a workflow intentionally should not invoke repair agents.

## Built-in Templates

Expand Down
Loading
Loading