ci(workflow): add rate-limit-aware retry for PR review comment posting#183
ci(workflow): add rate-limit-aware retry for PR review comment posting#183stay-foolish-forever wants to merge 3 commits into
Conversation
Add GitHub REST API rate-limit handling to the ocr-review workflow: - Implement computeRetryDelayMs() following GitHub's documented strategy: * Honor retry-after header (seconds or HTTP-date) * Wait until x-ratelimit-reset when primary limit is exhausted (remaining=0) * Exponential backoff (>=60s base) for secondary limits without a header * Backoff for transient 5xx/408 errors - Add per-comment retry loop with configurable attempts and delays - Add logRateLimitQuota() to log and proactively throttle on low remaining quota - Honor batch createReview rate-limit headers before per-comment retry - Cap all waits (including header-derived) to avoid stalling the CI job - Expose tuning via OCR_* env vars (retries, delays, thresholds) - Update example workflow docs with retry/delay configuration reference
| } else { | ||
| // Transient server error (5xx / 408): back off without the 60s floor. | ||
| const backoff = Math.min(base * Math.pow(2, attempt), cap); |
There was a problem hiding this comment.
Bug: The comment says "back off without the 60s floor" but the code uses the same base variable (default 60000ms = 60s) as the rate-limit exponential backoff path above. This means transient server errors (5xx/408) will also wait at least 60 seconds on the first retry, which contradicts the stated intent.
Either use a shorter base delay for transient errors (e.g., 1-5 seconds is typical for server hiccups), or correct the comment to reflect that the same base delay is intentionally applied to both cases.
Suggestion:
| } else { | |
| // Transient server error (5xx / 408): back off without the 60s floor. | |
| const backoff = Math.min(base * Math.pow(2, attempt), cap); | |
| } else { | |
| // Transient server error (5xx / 408): use a shorter base delay since | |
| // these are typically brief hiccups, not rate-limit enforcement. | |
| const transientBase = 2000; // 2s base for transient errors | |
| const backoff = Math.min(transientBase * Math.pow(2, attempt), cap); |
| const remaining = h['x-ratelimit-remaining']; | ||
| const limit = h['x-ratelimit-limit']; | ||
| const reset = h['x-ratelimit-reset']; |
There was a problem hiding this comment.
Inconsistency: Unlike computeRetryDelayMs which uses a defensive header() helper that checks both original-case and lowercase header names, this function only accesses headers via lowercase keys (e.g., h['x-ratelimit-remaining']). While Octokit typically normalizes headers to lowercase, this inconsistency means that if header casing ever differs, quota logging silently fails and proactive throttling (remaining returned as null) won't trigger, potentially causing unnecessary rate-limit hits.
Consider reusing the same case-insensitive lookup pattern for consistency and robustness.
…ookup - Transient server errors (5xx/408) now use a 2s base delay instead of the 60s rate-limit base, matching the comment's stated intent and avoiding unnecessary CI stalls on short-lived server hiccups. - Extract a shared getHeader() helper for case-insensitive header lookup, reused by both computeRetryDelayMs and logRateLimitQuota to eliminate duplicated logic and ensure consistent robustness.
Description
Add GitHub REST API rate-limit handling to the ocr-review workflow:
Type of Change
How Has This Been Tested?
make testpasses locallyManually constructing cases that will generate tons of comments (some are invalid) to trigger rate limit and testing what will happen under these cases, the script handles rate limit and other api error (caused by invalid comments) correctly.
Checklist
go fmt,go vet)Related Issues
This PR is to resolve issue #158 in this repo.