#586 Add large-diff risk bucketing to deep-review-pro by hubertgajewski · Pull Request #601 · hubertgajewski/orwellstat

hubertgajewski · 2026-06-07T08:30:32Z

Summary

Adds orchestrator-level large-diff risk bucketing when changed lines exceed 3000, classifying paths into high-risk, normal, low-risk, and generated buckets.
Metadata-only placeholder hunks for low-risk and generated paths; full hunks for high-risk and normal paths, ordered high-risk first.
Emits ### large-diff-bucketing in the aggregate and blocks status: ready while partial-review: yes unless the caller documents a full-review override.
Adds post-586 benchmark checkpoint (scoped-bucketed-v1, compact-static-bucketed-v1) and 586-large-diff-bucketing.md report.

Test plan

python3 scripts/benchmark_deep_review_epic_matrix.py
python3 -m unittest scripts.test_benchmark_deep_review_epic_matrix
python3 -m unittest scripts.test_benchmark_deep_review_pro
python3 -m compileall scripts/benchmark_deep_review_epic_matrix.py scripts/deep_review_benchmark_support.py scripts/test_benchmark_deep_review_epic_matrix.py scripts/test_benchmark_deep_review_pro.py
python3 scripts/benchmark_deep_review_epic_matrix.py --issue-section 586
CI green on PR

Closes #586

Contributes to #587

Summary by CodeRabbit

New Features
- Large diffs exceeding 3,000 changed lines now use intelligent bucketing to classify files by risk level
- Low-risk and generated files are condensed to metadata-only placeholders for streamlined review
- Reviews automatically marked as partial when large-diff bucketing is triggered
Documentation
- Updated guidance on large-diff handling behavior and partial-review semantics
Tests
- Added test coverage for large-diff bucketing and partial-review scenarios

Co-authored-by: Cursor <cursoragent@cursor.com>

coderabbitai · 2026-06-07T08:30:47Z

Warning

Review limit reached

@hubertgajewski, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 24 minutes and 29 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 2fa46d0c-64e2-450c-90be-d79de6ebda98

📥 Commits

Reviewing files that changed from the base of the PR and between 75e2622 and f4135bd.

📒 Files selected for processing (10)

.claude/skills/deep-review-pro/SKILL.md
docs/AI_ASSISTANTS.md
docs/deep-review-pro-benchmark/README.md
docs/deep-review-pro-benchmark/reports/586-large-diff-bucketing.md
docs/deep-review-pro-benchmark/reports/587-epic-token-cost-matrix.json
docs/deep-review-pro-benchmark/reports/587-epic-token-cost-matrix.md
scripts/benchmark_deep_review_epic_matrix.py
scripts/deep_review_benchmark_support.py
scripts/test_benchmark_deep_review_epic_matrix.py
scripts/test_benchmark_deep_review_pro.py

📝 Walkthrough

Walkthrough

This PR implements large-diff risk bucketing for /deep-review-pro to classify files into risk-based buckets when diffs exceed 3000 changed lines, replace low-risk and generated content with metadata-only placeholders, and block readiness until all required buckets are confirmed reviewed. The feature includes skill logic, bucketing algorithm, prompt-frame integration, benchmark metrics, and comprehensive test validation.

Changes

Large-Diff Risk Bucketing

Layer / File(s)	Summary
Feature specification and documentation `.claude/skills/deep-review-pro/SKILL.md`, `.claude/agents/deep-review-security.md`, `docs/AI_ASSISTANTS.md`	Skill YAML adds `CHANGED_LINE_COUNT` metric, large-diff bucketing phase that classifies files into high-risk/normal/low-risk/generated buckets when exceeding 3000 changed lines, and partial-review readiness blocking. Security agent guidance and user docs describe placeholder metadata behavior and review flow.
Large-diff bucketing implementation `scripts/deep_review_benchmark_support.py`	Threshold constants, `LargeDiffBucketingPlan` dataclass, and functions to count changed lines, classify paths by risk, bucket aggregation, convert low-risk/generated hunks to metadata-only placeholders, and reorder blocks by bucket priority.
Prompt-diff selection and frame generation `scripts/deep_review_benchmark_support.py`	`select_prompt_diff_v1` applies bucketing logic to filter diff blocks by prompt scope; `build_scoped_prompt_frames_bucketed_v1` constructs per-agent v1 prompt frames with bucketed/scoped diff content and generated changed-file lists.
Benchmark harness contracts and checkpoint `scripts/benchmark_deep_review_epic_matrix.py`	New `scoped-bucketed-v1` and `compact-static-bucketed-v1` contract types; post-586 checkpoint configured with bucketed prompt frames and compact-static bucketing output mode; contract-to-mode mapping.
Large-diff bucketing metrics and output rendering `scripts/benchmark_deep_review_epic_matrix.py`	`prompt_frame_lengths_scoped_bucketed_v1` computes per-agent bucketed frame lengths; `large_diff_bucketing_proxy_section` summarizes bucket counts and partial-review flag; `compact_static_bucketed_output_proxy` integrates bucketing into compact-static output and sets status to blocked when partial review is active; new output contract builder registration.
Benchmark reports and token metrics `docs/deep-review-pro-benchmark/README.md`, `docs/deep-review-pro-benchmark/reports/586-large-diff-bucketing.md`, `docs/deep-review-pro-benchmark/reports/587-epic-token-cost-matrix.{json,md}`	New issue-586 benchmark report documents threshold behavior, prompt/aggregate proxy metrics, and bucketing effects on frame sizes. JSON and markdown matrices add post-586 checkpoint metrics and incremental/cumulative token/character deltas vs prior checkpoints.
Test coverage `scripts/test_benchmark_deep_review_epic_matrix.py`, `scripts/test_benchmark_deep_review_pro.py`	Epic matrix tests update post-586 checkpoint expectations; new tests validate large-diff bucketing marks high-lines reviews as partial and confirm bucketed frames are smaller than scoped frames. Benchmark-pro tests assert skill documents bucketing and verify bucketing behavior reduces frame sizes and sets readiness-blocking flags.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

hubertgajewski/orwellstat#587: This PR implements the child story #586 of the epic in #587, fulfilling the large-diff bucketing acceptance criteria and benchmark requirements.

Possibly related PRs

hubertgajewski/orwellstat#592: Both modify per-agent prompt-frame/scoped subdiff construction (hunk selection and <changed-files> context), with this PR extending that flow to add large-diff risk bucketing and metadata-only placeholders.
hubertgajewski/orwellstat#600: Both extend compact aggregate-output contracts and readiness/blocking behavior in the benchmark harness (static pre-pass in #600 vs large-diff bucketing in this PR).
hubertgajewski/orwellstat#594: Both modify /deep-review-pro aggregate output contract in skill and user docs—#583 Reduce deep-review-pro output verbosity #594 changes compact-mode emission rules, this PR adds large-diff bucketing/placeholders and readiness blocking to the same compact output flow.

Poem

🐰 Large diffs once sprawled without a care,
But bucketing makes the review fair—
High-risk content shines up front today,
While generated noise stays tucked away! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title '`#586` Add large-diff risk bucketing to deep-review-pro' directly and clearly summarizes the main change: implementing large-diff risk bucketing feature for the deep-review-pro orchestrator as specified in issue `#586`.
Linked Issues check	✅ Passed	The PR implementation fully meets all primary acceptance criteria from `#586`: deterministic path classification into risk buckets, high-risk prioritization, partial-review blocking when non-generated code is deferred, metadata-only placeholders for generated/low-risk files, and comprehensive benchmark checkpoint/reporting [`#586`, `#587`].
Out of Scope Changes check	✅ Passed	All changes are directly in-scope: skill documentation, benchmark infrastructure, prompt updates, validation tests, and supporting helper functions all serve the large-diff bucketing feature requirements without unrelated modifications.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feature/586

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Co-authored-by: Cursor <cursoragent@cursor.com>

hubertgajewski and others added 2 commits June 7, 2026 10:29

#586 Add large-diff risk bucketing to deep-review-pro

21373dc

Co-authored-by: Cursor <cursoragent@cursor.com>

#586 Pin post-586 checkpoint to 21373dc

75e2622

hubertgajewski temporarily deployed to staging June 7, 2026 08:30 — with GitHub Actions Inactive

hubertgajewski had a problem deploying to staging June 7, 2026 08:31 — with GitHub Actions Error

#586 Address deep-review-pro findings on large-diff bucketing

be80794

Co-authored-by: Cursor <cursoragent@cursor.com>

hubertgajewski temporarily deployed to staging June 7, 2026 08:34 — with GitHub Actions Inactive

hubertgajewski temporarily deployed to staging June 7, 2026 08:35 — with GitHub Actions Inactive

hubertgajewski temporarily deployed to staging June 7, 2026 08:43 — with GitHub Actions Inactive

hubertgajewski and others added 3 commits June 7, 2026 10:52

#586 Close deep-review-pro iteration 3 findings

773d889

Co-authored-by: Cursor <cursoragent@cursor.com>

#586 Address CodeRabbit review and pin benchmark evidence

64e0f12

Co-authored-by: Cursor <cursoragent@cursor.com>

#586 Pin post-586 checkpoint to 64e0f12

d9ce7c7

Co-authored-by: Cursor <cursoragent@cursor.com>

hubertgajewski temporarily deployed to staging June 7, 2026 08:54 — with GitHub Actions Inactive

hubertgajewski temporarily deployed to staging June 7, 2026 08:55 — with GitHub Actions Inactive

hubertgajewski had a problem deploying to staging June 7, 2026 08:55 — with GitHub Actions Error

hubertgajewski temporarily deployed to staging June 7, 2026 08:55 — with GitHub Actions Inactive

#586 Close deep-review-pro iteration 4 findings

f4135bd

Co-authored-by: Cursor <cursoragent@cursor.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#586 Add large-diff risk bucketing to deep-review-pro#601

#586 Add large-diff risk bucketing to deep-review-pro#601
hubertgajewski merged 8 commits into
mainfrom
feature/586

hubertgajewski commented Jun 7, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 7, 2026 •

edited

Loading

Review limit reached

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hubertgajewski commented Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hubertgajewski commented Jun 7, 2026 •

edited

Loading

coderabbitai Bot commented Jun 7, 2026 •

edited

Loading