#585 Add static deep-review pre-pass by hubertgajewski · Pull Request #600 · hubertgajewski/orwellstat

hubertgajewski · 2026-06-06T17:05:47Z

Summary

Adds a deep-review-pro static pre-pass for TypeScript compile checks, formatting, actionlint/shellcheck coverage, secret scanning, and coverage-matrix validation.
Narrows project-checklist ownership to Playwright/POM/fixture/tag/path-alias/loadEnv concerns while moving mechanical static checks to the orchestrator.
Updates specialist prompts, assistant docs, benchmark support, fixtures, and reports for the new post-585 checkpoint.

Test Plan

python3 scripts/benchmark_deep_review_epic_matrix.py
python3 scripts/test_benchmark_deep_review_epic_matrix.py
python3 scripts/test_benchmark_deep_review_pro.py
python3 -m compileall scripts/benchmark_deep_review_epic_matrix.py scripts/deep_review_benchmark_support.py scripts/test_benchmark_deep_review_epic_matrix.py scripts/test_benchmark_deep_review_pro.py
git diff origin/main..HEAD --check
python3 scripts/benchmark_deep_review_epic_matrix.py --issue-section 585

Closes #585
Contributes to #587

Epic Comparable Benchmark

These rows are generated from 587-epic-token-cost-matrix.md so this issue can be compared with every other #587 child story using the same units.

Use this section for cross-ticket comparison. Story-specific tables below are retained as local evidence and may use a narrower prompt-only, output-only, dispatch-only, or rerun/cache proxy surface.

Incremental Delta: post-584 -> post-585

Metric	Before	After	Delta
Combined chars	1,975,708	1,961,647	-14,061 (-0.71%)
Combined est. tokens	493,934	490,417	-3,517 (-0.71%)

Cumulative Delta: original-580 -> post-585

Metric	Original #580 baseline	Current checkpoint	Delta
Combined chars	3,822,193	1,961,647	-1,860,546 (-48.68%)
Combined est. tokens	955,555	490,417	-465,138 (-48.68%)

Summary by CodeRabbit

New Features
- Added an orchestrator "static pre-pass" that gates readiness with compile/format, workflow, secret-scan and coverage availability checks; updated dispatch rules so project-checklist targets Playwright/Bruno surfaces and CI/security dispatch is tightened.
- Output formatting: static-tool findings no longer require a "(static)" prefix.
Documentation
- Revised Deep Review Pro docs, benchmarks and configs to document the static pre-pass, dispatch changes, compact-static output contract, and new post-585 checkpoint.
Tests
- Extended benchmarks and unit tests to validate WORKTREE checkpoints and static pre-pass behaviors.

coderabbitai · 2026-06-06T17:06:01Z

Warning

Review limit reached

@hubertgajewski, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 47 minutes and 2 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: f01145e2-69b6-4a00-bde3-35c76f09ec1a

📥 Commits

Reviewing files that changed from the base of the PR and between 63a4225 and 8e4d7af.

📒 Files selected for processing (1)

.claude/skills/deep-review-pro/SKILL.md

📝 Walkthrough

Walkthrough

Introduce an orchestrator-level static pre-pass, narrow specialist agents to semantic responsibilities, add dispatch-static-v1/compact-static-v1 and WORKTREE checkpoint support in benchmark scripts, update support dispatch selectors, update benchmark fixtures/reports, and expand tests to validate the new static pre-pass outputs and routing.

Changes

Static Pre-Pass Orchestration and Agent Refactoring

Layer / File(s)	Summary
Agent prompt contract clarification `.claude/agents/deep-review-ci.md`, `.claude/agents/deep-review-project-checklist.md`, `.claude/agents/deep-review-qa.md`, `.claude/agents/deep-review-security.md`, `.claude/agents/deep-review-typescript.md`	Agent prompts treat `PROMPT_FRAME` as data, remove mechanical static-tool ownership, restrict tool grants, and refocus agents on semantic review responsibilities.
Deep-review-pro static pre-pass specification `.claude/skills/deep-review-pro/SKILL.md`, `docs/AI_ASSISTANTS.md`	Adds an orchestrator-owned Static pre-pass with a deterministic row schema, explicit pass/fail/unavailable semantics, changed dispatch triggers (project-checklist narrowed), reuse/iteration rebuild rules, aggregate emission ordering, and token-accounting guidance.
Dispatch-static-v1 and compact-static-v1 implementation `scripts/benchmark_deep_review_epic_matrix.py`	Adds `WORKTREE_REF`, `dispatch-static-v1` and `compact-static-v1` contracts, threads `diff_text` through output builders, implements `static_prepass_proxy_rows/section` and `compact_static_output_proxy`, and adds static dispatch routing.
WORKTREE checkpoint support and ref handling `scripts/benchmark_deep_review_epic_matrix.py`	Support for a final WORKTREE checkpoint that reads current working-tree files directly, resolves `WORKTREE` ref to itself, skips historical-ref validation for WORKTREE, and enforces final-position constraints for HEAD/WORKTREE.
Static dispatch selection functions `scripts/deep_review_benchmark_support.py`	Adds static checklist-path predicates, `collect_added_lines`, `dispatch_project_checklist_static_v1`, `dispatch_matches_static_v1`, `selected_agents_for_diff_static_v1`, rebinds public selectors to static-v1, and tightens credential-line regex.
Documentation of static pre-pass behavior `docs/AI_ASSISTANTS.md`, `docs/deep-review-pro-benchmark/README.md`, `docs/CONFIGURATION.md`	Document the static-pre-pass phase, refined agent triggering rules, WORKTREE behavior, compact/verbose output differences, and configuration mapping for actionlint/shellcheck to the orchestrator pre-pass.
Benchmark data and fixture updates `docs/deep-review-pro-benchmark/fixtures.json`, `docs/deep-review-pro-benchmark/reports/587-epic-token-cost-matrix.json`, `docs/deep-review-pro-benchmark/reports/587-epic-token-cost-matrix.md`	Add `post-585` checkpoint with `dispatch-static-v1` + `compact-static-v1`, update workflow fixture dispatch/skip expectations, and include new checkpoint rows and deltas in JSON/MD reports.
Test assertions for static pre-pass and checkpoint `scripts/test_benchmark_deep_review_epic_matrix.py`, `scripts/test_benchmark_deep_review_pro.py`	Extend tests to assert WORKTREE checkpoint handling, post-585 sequencing/contracts, static-pre-pass compact output behavior and blocking logic, updated prompt-frame static section presence, credential regex behavior, and fixture dispatch expectations.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

[epic] Reduce deep-review-pro token cost without lowering review quality #587: Implements the static pre-pass, new dispatch/output contracts, WORKTREE handling, and updated dispatch/selectors required by issue #587.

Possibly related PRs

hubertgajewski/orwellstat#593: Reuse/cache and final-pass gating changes that interact with the revised reuse/static-pre-pass iteration behavior.
hubertgajewski/orwellstat#599: Related specialist prompt/output-schema edits affecting deep-review CI prompt framing.
hubertgajewski/orwellstat#591: Related dispatch/trigger gating changes for project-checklist and security paths.

Suggested reviewers

claude

Poem

🐇 The static pass now hops ahead, so LLMs may think,

Mechanical weeds are pulled; the specialists now blink.
WORKTREE hums a pre-commit tune, the benchmark marks the way,
Specialists sniff semantics while the orchestrator keeps the day.
A joyful nibble, fewer tokens spent — hooray!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title clearly describes the main change: adding a static deep-review pre-pass, which directly reflects the primary objective of this PR.
Linked Issues check	✅ Passed	The PR implements all acceptance criteria from `#585`: static pre-pass with TypeScript/format/actionlint/secret/coverage checks, project-checklist narrowed to Playwright conventions, fallback for unavailable checks, and comprehensive documentation and benchmarking.
Out of Scope Changes check	✅ Passed	All changes are scoped to `#585` objectives: agent prompt updates, static pre-pass implementation, benchmark support, dispatch logic refinement, and documentation updates directly supporting the static pre-pass feature.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feature/585

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

scripts/test_benchmark_deep_review_epic_matrix.py (1)

136-139: ⚡ Quick win

Use a contract-stable assertion instead of prose text matching.

Line 137 currently couples this test to a human-readable domain phrase ("path-alias / loadEnv"), which is brittle and can fail on harmless wording edits.

Suggested change

-        self.assertIn(
-            "path-alias / loadEnv",
-            roster["deep-review-project-checklist"]["domain"],
-        )
+        self.assertIn("deep-review-project-checklist", roster)
+        self.assertEqual(
+            roster["deep-review-project-checklist"]["dispatch"],
+            "project-checklist trigger",
+        )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/test_benchmark_deep_review_epic_matrix.py` around lines 136 - 139,
The test is brittle because it matches a human-readable phrase; replace the
prose match with a contract-stable assertion: instead of assertIn("path-alias /
loadEnv", roster["deep-review-project-checklist"]["domain"]), extract the
canonical/stable identifier for that domain from the same entry (e.g., a
slug/id/tag field) or normalize the domain string into a stable form and compare
it to a test constant (e.g., EXPECTED_DOMAIN_TAG) using assertEqual or assertIn
against a list of stable tags; update the assertion referring to
roster["deep-review-project-checklist"]["domain"] (and add a small constant like
EXPECTED_DOMAIN_TAG) so future wording changes won’t break the test.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@scripts/test_benchmark_deep_review_epic_matrix.py`:
- Around line 136-139: The test is brittle because it matches a human-readable
phrase; replace the prose match with a contract-stable assertion: instead of
assertIn("path-alias / loadEnv",
roster["deep-review-project-checklist"]["domain"]), extract the canonical/stable
identifier for that domain from the same entry (e.g., a slug/id/tag field) or
normalize the domain string into a stable form and compare it to a test constant
(e.g., EXPECTED_DOMAIN_TAG) using assertEqual or assertIn against a list of
stable tags; update the assertion referring to
roster["deep-review-project-checklist"]["domain"] (and add a small constant like
EXPECTED_DOMAIN_TAG) so future wording changes won’t break the test.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 008648f6-98d4-46c3-90c4-4fae0e2493b8

📥 Commits

Reviewing files that changed from the base of the PR and between d9bebf2 and f82095c.

📒 Files selected for processing (15)

.claude/agents/deep-review-ci.md
.claude/agents/deep-review-project-checklist.md
.claude/agents/deep-review-qa.md
.claude/agents/deep-review-security.md
.claude/agents/deep-review-typescript.md
.claude/skills/deep-review-pro/SKILL.md
docs/AI_ASSISTANTS.md
docs/deep-review-pro-benchmark/README.md
docs/deep-review-pro-benchmark/fixtures.json
docs/deep-review-pro-benchmark/reports/587-epic-token-cost-matrix.json
docs/deep-review-pro-benchmark/reports/587-epic-token-cost-matrix.md
scripts/benchmark_deep_review_epic_matrix.py
scripts/deep_review_benchmark_support.py
scripts/test_benchmark_deep_review_epic_matrix.py
scripts/test_benchmark_deep_review_pro.py

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.claude/skills/deep-review-pro/SKILL.md:
- Around line 418-424: The fenced code block that begins with "### aggregate" is
missing a language tag (causing markdownlint MD040); change the opening fence
from ``` to ```text so the block is fenced as ```text ... ``` and include the
same inner lines unchanged (the block starting at the "### aggregate" header).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: dcefc071-9621-47cc-a89c-024c0105a599

📥 Commits

Reviewing files that changed from the base of the PR and between f82095c and 4e0161c.

📒 Files selected for processing (16)

.claude/agents/deep-review-ci.md
.claude/agents/deep-review-project-checklist.md
.claude/agents/deep-review-qa.md
.claude/agents/deep-review-security.md
.claude/agents/deep-review-typescript.md
.claude/skills/deep-review-pro/SKILL.md
docs/AI_ASSISTANTS.md
docs/CONFIGURATION.md
docs/deep-review-pro-benchmark/README.md
docs/deep-review-pro-benchmark/fixtures.json
docs/deep-review-pro-benchmark/reports/587-epic-token-cost-matrix.json
docs/deep-review-pro-benchmark/reports/587-epic-token-cost-matrix.md
scripts/benchmark_deep_review_epic_matrix.py
scripts/deep_review_benchmark_support.py
scripts/test_benchmark_deep_review_epic_matrix.py
scripts/test_benchmark_deep_review_pro.py

✅ Files skipped from review due to trivial changes (5)

docs/CONFIGURATION.md
.claude/agents/deep-review-security.md
docs/deep-review-pro-benchmark/reports/587-epic-token-cost-matrix.md
docs/deep-review-pro-benchmark/reports/587-epic-token-cost-matrix.json
docs/deep-review-pro-benchmark/README.md

🚧 Files skipped from review as they are similar to previous changes (5)

.claude/agents/deep-review-typescript.md
docs/deep-review-pro-benchmark/fixtures.json
docs/AI_ASSISTANTS.md
.claude/agents/deep-review-project-checklist.md
scripts/benchmark_deep_review_epic_matrix.py

…tic pre-pass Co-authored-by: Cursor <cursoragent@cursor.com>

coderabbitai

🧹 Nitpick comments (1)

docs/deep-review-pro-benchmark/reports/584-shared-boilerplate.md (1)
9-11: ⚡ Quick win

Clarify which measurement surface includes aggregate output.

Line 9 states "aggregate output is unchanged from #583 compact mode," and the "Prompt-Footprint Estimate" table (lines 53-64) measures prompt-input only. However, the "Epic Comparable Benchmark" incremental delta (lines 21-24) shows combined chars of 2,037,880 → 1,975,708, which is ~11,561 chars larger than the prompt-only total of 2,026,319 on line 64.

This suggests the Epic Comparable Benchmark section includes aggregate output in the combined totals, while the Prompt-Footprint Estimate table does not. Consider adding a brief note near line 15 or line 53 clarifying which tables measure prompt-only vs. combined (prompt + aggregate output) to prevent reader confusion.

Also applies to: 53-64
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/deep-review-pro-benchmark/reports/584-shared-boilerplate.md` around
lines 9 - 11, Summary: clarify which tables measure prompt-only vs
prompt+aggregate output. Update the "Epic Comparable Benchmark" section to
explicitly state that its incremental deltas include combined prompt + aggregate
output, and add a short clarifying footnote or parenthetical to the
"Prompt-Footprint Estimate" table header stating it measures prompt-input only;
adjust the text near the "Epic Comparable Benchmark" and "Prompt-Footprint
Estimate" headings so readers see the measurement surface immediately and ensure
the incremental-delta wording in "Epic Comparable Benchmark incremental delta"
references the combined-char totals.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@docs/deep-review-pro-benchmark/reports/584-shared-boilerplate.md`:
- Around line 9-11: Summary: clarify which tables measure prompt-only vs
prompt+aggregate output. Update the "Epic Comparable Benchmark" section to
explicitly state that its incremental deltas include combined prompt + aggregate
output, and add a short clarifying footnote or parenthetical to the
"Prompt-Footprint Estimate" table header stating it measures prompt-input only;
adjust the text near the "Epic Comparable Benchmark" and "Prompt-Footprint
Estimate" headings so readers see the measurement surface immediately and ensure
the incremental-delta wording in "Epic Comparable Benchmark incremental delta"
references the combined-char totals.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: bfb77c74-5dc6-47f0-8891-d59e9193a2c3

📥 Commits

Reviewing files that changed from the base of the PR and between 4e0161c and 63a4225.

📒 Files selected for processing (9)

docs/CONFIGURATION.md
docs/deep-review-pro-benchmark/reports/584-shared-boilerplate.md
docs/deep-review-pro-benchmark/reports/585-static-prepass.md
docs/deep-review-pro-benchmark/reports/587-epic-token-cost-matrix.json
docs/deep-review-pro-benchmark/reports/587-epic-token-cost-matrix.md
scripts/benchmark_deep_review_epic_matrix.py
scripts/deep_review_benchmark_support.py
scripts/test_benchmark_deep_review_epic_matrix.py
scripts/test_benchmark_deep_review_pro.py

✅ Files skipped from review due to trivial changes (3)

docs/deep-review-pro-benchmark/reports/587-epic-token-cost-matrix.json
docs/deep-review-pro-benchmark/reports/587-epic-token-cost-matrix.md
docs/CONFIGURATION.md

🚧 Files skipped from review as they are similar to previous changes (4)

scripts/test_benchmark_deep_review_pro.py
scripts/benchmark_deep_review_epic_matrix.py
scripts/test_benchmark_deep_review_epic_matrix.py
scripts/deep_review_benchmark_support.py

Co-authored-by: Cursor <cursoragent@cursor.com>

hubertgajewski temporarily deployed to staging June 6, 2026 17:05 — with GitHub Actions Inactive

hubertgajewski had a problem deploying to staging June 6, 2026 17:06 — with GitHub Actions Error

hubertgajewski temporarily deployed to staging June 6, 2026 17:06 — with GitHub Actions Inactive

hubertgajewski had a problem deploying to staging June 6, 2026 17:06 — with GitHub Actions Error

hubertgajewski force-pushed the feature/585 branch from f82095c to cad1af0 Compare June 6, 2026 17:09

hubertgajewski temporarily deployed to staging June 6, 2026 17:09 — with GitHub Actions Inactive

hubertgajewski temporarily deployed to staging June 6, 2026 17:10 — with GitHub Actions Inactive

hubertgajewski had a problem deploying to staging June 6, 2026 17:11 — with GitHub Actions Error

hubertgajewski temporarily deployed to staging June 6, 2026 17:12 — with GitHub Actions Inactive

coderabbitai Bot reviewed Jun 6, 2026

View reviewed changes

hubertgajewski self-assigned this Jun 6, 2026

hubertgajewski temporarily deployed to staging June 6, 2026 19:10 — with GitHub Actions Inactive

hubertgajewski had a problem deploying to staging June 6, 2026 19:11 — with GitHub Actions Error

hubertgajewski temporarily deployed to staging June 6, 2026 19:11 — with GitHub Actions Inactive

hubertgajewski had a problem deploying to staging June 6, 2026 19:11 — with GitHub Actions Error

#585 Harden static pre-pass proxy and address review findings

825069c

coderabbitai Bot reviewed Jun 6, 2026

View reviewed changes

Comment thread .claude/skills/deep-review-pro/SKILL.md Outdated

hubertgajewski and others added 2 commits June 6, 2026 21:18

#585 Pin post-585 checkpoint to 825069c

032f6d3

#584 #585 Add pinned benchmark reports for shared boilerplate and sta…

63a4225

…tic pre-pass Co-authored-by: Cursor <cursoragent@cursor.com>

coderabbitai Bot reviewed Jun 7, 2026

View reviewed changes

#585 Add text fence language tag to aggregate output example

8e4d7af

Co-authored-by: Cursor <cursoragent@cursor.com>

coderabbitai Bot mentioned this pull request Jun 7, 2026

#586 Add large-diff risk bucketing to deep-review-pro #601

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#585 Add static deep-review pre-pass#600

#585 Add static deep-review pre-pass#600
hubertgajewski merged 5 commits into
mainfrom
feature/585

hubertgajewski commented Jun 6, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 6, 2026 •

edited

Loading

Review limit reached

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hubertgajewski commented Jun 6, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Plan

Epic Comparable Benchmark

Incremental Delta: post-584 -> post-585

Cumulative Delta: original-580 -> post-585

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hubertgajewski commented Jun 6, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 6, 2026 •

edited

Loading