Skip to content

#586 Add large-diff risk bucketing to deep-review-pro#601

Merged
hubertgajewski merged 8 commits into
mainfrom
feature/586
Jun 7, 2026
Merged

#586 Add large-diff risk bucketing to deep-review-pro#601
hubertgajewski merged 8 commits into
mainfrom
feature/586

Conversation

@hubertgajewski

@hubertgajewski hubertgajewski commented Jun 7, 2026

Copy link
Copy Markdown
Owner

Summary

  • Adds orchestrator-level large-diff risk bucketing when changed lines exceed 3000, classifying paths into high-risk, normal, low-risk, and generated buckets.
  • Metadata-only placeholder hunks for low-risk and generated paths; full hunks for high-risk and normal paths, ordered high-risk first.
  • Emits ### large-diff-bucketing in the aggregate and blocks status: ready while partial-review: yes unless the caller documents a full-review override.
  • Adds post-586 benchmark checkpoint (scoped-bucketed-v1, compact-static-bucketed-v1) and 586-large-diff-bucketing.md report.

Test plan

  • python3 scripts/benchmark_deep_review_epic_matrix.py
  • python3 -m unittest scripts.test_benchmark_deep_review_epic_matrix
  • python3 -m unittest scripts.test_benchmark_deep_review_pro
  • python3 -m compileall scripts/benchmark_deep_review_epic_matrix.py scripts/deep_review_benchmark_support.py scripts/test_benchmark_deep_review_epic_matrix.py scripts/test_benchmark_deep_review_pro.py
  • python3 scripts/benchmark_deep_review_epic_matrix.py --issue-section 586
  • CI green on PR

Closes #586

Contributes to #587

Summary by CodeRabbit

  • New Features

    • Large diffs exceeding 3,000 changed lines now use intelligent bucketing to classify files by risk level
    • Low-risk and generated files are condensed to metadata-only placeholders for streamlined review
    • Reviews automatically marked as partial when large-diff bucketing is triggered
  • Documentation

    • Updated guidance on large-diff handling behavior and partial-review semantics
  • Tests

    • Added test coverage for large-diff bucketing and partial-review scenarios

@coderabbitai

coderabbitai Bot commented Jun 7, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@hubertgajewski, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 24 minutes and 29 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 2fa46d0c-64e2-450c-90be-d79de6ebda98

📥 Commits

Reviewing files that changed from the base of the PR and between 75e2622 and f4135bd.

📒 Files selected for processing (10)
  • .claude/skills/deep-review-pro/SKILL.md
  • docs/AI_ASSISTANTS.md
  • docs/deep-review-pro-benchmark/README.md
  • docs/deep-review-pro-benchmark/reports/586-large-diff-bucketing.md
  • docs/deep-review-pro-benchmark/reports/587-epic-token-cost-matrix.json
  • docs/deep-review-pro-benchmark/reports/587-epic-token-cost-matrix.md
  • scripts/benchmark_deep_review_epic_matrix.py
  • scripts/deep_review_benchmark_support.py
  • scripts/test_benchmark_deep_review_epic_matrix.py
  • scripts/test_benchmark_deep_review_pro.py
📝 Walkthrough

Walkthrough

This PR implements large-diff risk bucketing for /deep-review-pro to classify files into risk-based buckets when diffs exceed 3000 changed lines, replace low-risk and generated content with metadata-only placeholders, and block readiness until all required buckets are confirmed reviewed. The feature includes skill logic, bucketing algorithm, prompt-frame integration, benchmark metrics, and comprehensive test validation.

Changes

Large-Diff Risk Bucketing

Layer / File(s) Summary
Feature specification and documentation
.claude/skills/deep-review-pro/SKILL.md, .claude/agents/deep-review-security.md, docs/AI_ASSISTANTS.md
Skill YAML adds CHANGED_LINE_COUNT metric, large-diff bucketing phase that classifies files into high-risk/normal/low-risk/generated buckets when exceeding 3000 changed lines, and partial-review readiness blocking. Security agent guidance and user docs describe placeholder metadata behavior and review flow.
Large-diff bucketing implementation
scripts/deep_review_benchmark_support.py
Threshold constants, LargeDiffBucketingPlan dataclass, and functions to count changed lines, classify paths by risk, bucket aggregation, convert low-risk/generated hunks to metadata-only placeholders, and reorder blocks by bucket priority.
Prompt-diff selection and frame generation
scripts/deep_review_benchmark_support.py
select_prompt_diff_v1 applies bucketing logic to filter diff blocks by prompt scope; build_scoped_prompt_frames_bucketed_v1 constructs per-agent v1 prompt frames with bucketed/scoped diff content and generated changed-file lists.
Benchmark harness contracts and checkpoint
scripts/benchmark_deep_review_epic_matrix.py
New scoped-bucketed-v1 and compact-static-bucketed-v1 contract types; post-586 checkpoint configured with bucketed prompt frames and compact-static bucketing output mode; contract-to-mode mapping.
Large-diff bucketing metrics and output rendering
scripts/benchmark_deep_review_epic_matrix.py
prompt_frame_lengths_scoped_bucketed_v1 computes per-agent bucketed frame lengths; large_diff_bucketing_proxy_section summarizes bucket counts and partial-review flag; compact_static_bucketed_output_proxy integrates bucketing into compact-static output and sets status to blocked when partial review is active; new output contract builder registration.
Benchmark reports and token metrics
docs/deep-review-pro-benchmark/README.md, docs/deep-review-pro-benchmark/reports/586-large-diff-bucketing.md, docs/deep-review-pro-benchmark/reports/587-epic-token-cost-matrix.{json,md}
New issue-586 benchmark report documents threshold behavior, prompt/aggregate proxy metrics, and bucketing effects on frame sizes. JSON and markdown matrices add post-586 checkpoint metrics and incremental/cumulative token/character deltas vs prior checkpoints.
Test coverage
scripts/test_benchmark_deep_review_epic_matrix.py, scripts/test_benchmark_deep_review_pro.py
Epic matrix tests update post-586 checkpoint expectations; new tests validate large-diff bucketing marks high-lines reviews as partial and confirm bucketed frames are smaller than scoped frames. Benchmark-pro tests assert skill documents bucketing and verify bucketing behavior reduces frame sizes and sets readiness-blocking flags.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

  • hubertgajewski/orwellstat#587: This PR implements the child story #586 of the epic in #587, fulfilling the large-diff bucketing acceptance criteria and benchmark requirements.

Possibly related PRs

  • hubertgajewski/orwellstat#592: Both modify per-agent prompt-frame/scoped subdiff construction (hunk selection and <changed-files> context), with this PR extending that flow to add large-diff risk bucketing and metadata-only placeholders.
  • hubertgajewski/orwellstat#600: Both extend compact aggregate-output contracts and readiness/blocking behavior in the benchmark harness (static pre-pass in #600 vs large-diff bucketing in this PR).
  • hubertgajewski/orwellstat#594: Both modify /deep-review-pro aggregate output contract in skill and user docs—#583 Reduce deep-review-pro output verbosity #594 changes compact-mode emission rules, this PR adds large-diff bucketing/placeholders and readiness blocking to the same compact output flow.

Poem

🐰 Large diffs once sprawled without a care,
But bucketing makes the review fair—
High-risk content shines up front today,
While generated noise stays tucked away! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title '#586 Add large-diff risk bucketing to deep-review-pro' directly and clearly summarizes the main change: implementing large-diff risk bucketing feature for the deep-review-pro orchestrator as specified in issue #586.
Linked Issues check ✅ Passed The PR implementation fully meets all primary acceptance criteria from #586: deterministic path classification into risk buckets, high-risk prioritization, partial-review blocking when non-generated code is deferred, metadata-only placeholders for generated/low-risk files, and comprehensive benchmark checkpoint/reporting [#586, #587].
Out of Scope Changes check ✅ Passed All changes are directly in-scope: skill documentation, benchmark infrastructure, prompt updates, validation tests, and supporting helper functions all serve the large-diff bucketing feature requirements without unrelated modifications.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/586

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Co-authored-by: Cursor <cursoragent@cursor.com>
hubertgajewski and others added 3 commits June 7, 2026 10:52
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[enhancement] Add large-diff risk bucketing to deep-review-pro

1 participant