Skip to content

ci(workflow): add rate-limit-aware retry for PR review comment posting#183

Open
stay-foolish-forever wants to merge 3 commits into
alibaba:mainfrom
stay-foolish-forever:ci/add-rate-limit
Open

ci(workflow): add rate-limit-aware retry for PR review comment posting#183
stay-foolish-forever wants to merge 3 commits into
alibaba:mainfrom
stay-foolish-forever:ci/add-rate-limit

Conversation

@stay-foolish-forever

@stay-foolish-forever stay-foolish-forever commented Jun 21, 2026

Copy link
Copy Markdown
Contributor

Description

Add GitHub REST API rate-limit handling to the ocr-review workflow:

  • Implement computeRetryDelayMs() following GitHub's documented strategy:
    • Honor retry-after header (seconds or HTTP-date)
    • Wait until x-ratelimit-reset when primary limit is exhausted (remaining=0)
    • Exponential backoff (>=60s base) for secondary limits without a header
    • Backoff for transient 5xx/408 errors
  • Add per-comment retry loop with configurable attempts and delays
  • Add logRateLimitQuota() to log and proactively throttle on low remaining quota
  • Honor batch createReview rate-limit headers before per-comment retry
  • Cap all waits (including header-derived) to avoid stalling the CI job
  • Expose tuning via OCR_* env vars (retries, delays, thresholds)
  • Update example workflow docs with retry/delay configuration reference

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Refactoring (no functional changes)
  • Documentation update
  • CI / Build / Tooling

How Has This Been Tested?

  • make test passes locally
  • Manual testing (describe below)
    Manually constructing cases that will generate tons of comments (some are invalid) to trigger rate limit and testing what will happen under these cases, the script handles rate limit and other api error (caused by invalid comments) correctly.

Checklist

  • My code follows the project's coding style (go fmt, go vet)
  • I have performed a self-review of my code
  • I have added tests that prove my fix is effective or my feature works
  • New and existing unit tests pass locally with my changes
  • I have updated the documentation accordingly (if applicable)
  • I have signed the CLA

Related Issues

This PR is to resolve issue #158 in this repo.

Add GitHub REST API rate-limit handling to the ocr-review workflow:

- Implement computeRetryDelayMs() following GitHub's documented strategy:
  * Honor retry-after header (seconds or HTTP-date)
  * Wait until x-ratelimit-reset when primary limit is exhausted (remaining=0)
  * Exponential backoff (>=60s base) for secondary limits without a header
  * Backoff for transient 5xx/408 errors
- Add per-comment retry loop with configurable attempts and delays
- Add logRateLimitQuota() to log and proactively throttle on low remaining quota
- Honor batch createReview rate-limit headers before per-comment retry
- Cap all waits (including header-derived) to avoid stalling the CI job
- Expose tuning via OCR_* env vars (retries, delays, thresholds)
- Update example workflow docs with retry/delay configuration reference

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔍 OpenCodeReview found 2 issue(s) in this PR.

  • ✅ 2 posted as inline comment(s)
  • 📝 0 posted as summary

Comment thread .github/workflows/ocr-review.yml Outdated
Comment on lines +370 to +372
} else {
// Transient server error (5xx / 408): back off without the 60s floor.
const backoff = Math.min(base * Math.pow(2, attempt), cap);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The comment says "back off without the 60s floor" but the code uses the same base variable (default 60000ms = 60s) as the rate-limit exponential backoff path above. This means transient server errors (5xx/408) will also wait at least 60 seconds on the first retry, which contradicts the stated intent.

Either use a shorter base delay for transient errors (e.g., 1-5 seconds is typical for server hiccups), or correct the comment to reflect that the same base delay is intentionally applied to both cases.

Suggestion:

Suggested change
} else {
// Transient server error (5xx / 408): back off without the 60s floor.
const backoff = Math.min(base * Math.pow(2, attempt), cap);
} else {
// Transient server error (5xx / 408): use a shorter base delay since
// these are typically brief hiccups, not rate-limit enforcement.
const transientBase = 2000; // 2s base for transient errors
const backoff = Math.min(transientBase * Math.pow(2, attempt), cap);

Comment thread .github/workflows/ocr-review.yml Outdated
Comment on lines +390 to +392
const remaining = h['x-ratelimit-remaining'];
const limit = h['x-ratelimit-limit'];
const reset = h['x-ratelimit-reset'];

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistency: Unlike computeRetryDelayMs which uses a defensive header() helper that checks both original-case and lowercase header names, this function only accesses headers via lowercase keys (e.g., h['x-ratelimit-remaining']). While Octokit typically normalizes headers to lowercase, this inconsistency means that if header casing ever differs, quota logging silently fails and proactive throttling (remaining returned as null) won't trigger, potentially causing unnecessary rate-limit hits.

Consider reusing the same case-insensitive lookup pattern for consistency and robustness.

…ookup

- Transient server errors (5xx/408) now use a 2s base delay instead of
  the 60s rate-limit base, matching the comment's stated intent and
  avoiding unnecessary CI stalls on short-lived server hiccups.
- Extract a shared getHeader() helper for case-insensitive header
  lookup, reused by both computeRetryDelayMs and logRateLimitQuota to
  eliminate duplicated logic and ensure consistent robustness.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant