Skip to content

feat(benchmarks): Support local Claude UI benchmark suites #165

feat(benchmarks): Support local Claude UI benchmark suites

feat(benchmarks): Support local Claude UI benchmark suites #165

Triggered via pull request May 26, 2026 01:28
Status Success
Total duration 28m 51s
Artifacts

warden.yml

on: pull_request
Fit to window
Zoom out
Zoom in

Annotations

4 warnings
activateSkill silently proceeds without skillDirs, failing after expensive simulator/preflight setup: src/benchmarks/claude-ui/harness.ts#L528
There's a guard for `skillDirs` requiring `isolatedWorkingDirectory` (line 528), but no equivalent early check that `activateSkill` requires `skillDirs`. When `activateSkill` is set without `skillDirs`, `installProjectSkills` returns `[]`, and `readActivatedSkillPrompt` throws only after simulator creation and preflight commands have already run, wasting significant time.
Exit code 0 when benchmark metrics regress or sequences mismatch: src/benchmarks/claude-ui/harness.ts#L751
The process now exits 0 whenever all suites `completed` (Claude exited cleanly, parser succeeded), ignoring whether metrics exceeded baselines or tool sequences diverged — sequence mismatches (`sequence.matched = false`) and metric deltas have no effect on the exit code, so CI pipelines cannot use it to detect regressions.
User-configured `failurePatterns` matches don't affect process exit code: src/benchmarks/claude-ui/harness.ts#L751
When benchmark suites match configured `failurePatterns` (e.g. `failurePatterns: ["BUILD FAILED"]`), those pattern failures are counted in `completion.issueCount` and displayed, but `completed` is still `true` and the harness exits 0 — so CI gates based on exit code will never catch these explicitly-declared failure conditions.
review
Node.js 20 actions are deprecated. The following actions are running on Node.js 20 and may not work as expected: actions/checkout@v4. Actions will be forced to run with Node.js 24 by default starting June 2nd, 2026. Node.js 20 will be removed from the runner on September 16th, 2026. Please check if updated versions of these actions are available that support Node.js 24. To opt into Node.js 24 now, set the FORCE_JAVASCRIPT_ACTIONS_TO_NODE24=true environment variable on the runner or in your workflow file. Once Node.js 24 becomes the default, you can temporarily opt out by setting ACTIONS_ALLOW_USE_UNSECURE_NODE_VERSION=true. For more information see: https://github.blog/changelog/2025-09-19-deprecation-of-node-20-on-github-actions-runners/