errors: improve rate limit error messages for AI agents#2386
errors: improve rate limit error messages for AI agents#2386danmoseley wants to merge 5 commits into
Conversation
When the GitHub API returns a rate limit error, replace the raw Go HTTP error string with a clean, actionable message so agents know exactly how long to wait before retrying. Before: search code: GET https://api.github.com/search/code: 403 API rate limit exceeded for user ID 12345. [rate reset in 2m59s] After: search code: GitHub API rate limit exceeded. Retry after 2m59s. create issue: GitHub secondary rate limit exceeded. Retry after 47s. create issue: GitHub secondary rate limit exceeded. Wait before retrying. Edge cases: expired/zero reset time, nil RetryAfter, and errors wrapped with errors.As all produce "Wait before retrying." rather than a negative or confusing duration. The original error is stored in context via addGitHubAPIErrorToContext before the rate-limit check, so middleware is unaffected. Fixes github#2385. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR improves agent-facing tool error messages when GitHub API calls hit primary (RateLimitError) or secondary (AbuseRateLimitError) rate limits, replacing noisy raw HTTP error strings with concise “Retry after …” / “Wait before retrying” guidance.
Changes:
- Add rate-limit specific formatting in
NewGitHubAPIErrorResponseusingerrors.Asfor wrapped errors. - Add unit tests covering primary/secondary rate limit messages, wrapped errors, and edge cases (reset time expired/zero, missing retry-after).
Show a summary per file
| File | Description |
|---|---|
| pkg/errors/error.go | Adds rate-limit error detection and returns clean, actionable messages while still recording the original error in context. |
| pkg/errors/error_test.go | Adds coverage for the new rate-limit formatting behavior and edge cases. |
Copilot's findings
Comments suppressed due to low confidence (1)
pkg/errors/error_test.go:614
- This wrapped-rate-limit test has the same flakiness risk as the earlier one:
expectedRetryInis calculated after the call, so the rounded value may not match what the function formatted. Consider computing the expected value before invokingNewGitHubAPIErrorResponseor relaxing the assertion to tolerate small timing deltas.
result := NewGitHubAPIErrorResponse(ctx, "search code", resp, wrappedErr)
expectedRetryIn := time.Until(resetTime).Round(time.Second)
- Files reviewed: 2/2 changed files
- Comments generated: 2
Compute expectedRetryIn before calling the function under test, and use larger reset time offsets (20-30 min), so a 1s boundary during time.Duration.Round cannot cause spurious mismatches. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Reduces repetition in TestNewGitHubAPIErrorResponse_RateLimits subtests. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ee67105 to
52d8383
Compare
- Primary rate limit: compute time.Until(resetTime) once and check the rounded result is >0 before showing 'Retry after X'. This avoids a TOCTOU race between the After(time.Now()) guard and the subsequent time.Until call, and prevents showing 'Retry after 0s.' when the reset time is imminent. - Secondary rate limit: round RetryAfter first, then check >0. Previously, a RetryAfter of e.g. 200ms would pass the >0 guard but format as 'Retry after 0s.' after rounding. - Add tests for both sub-second edge cases. - Remove UTF-8 BOM accidentally introduced in error_test.go by .NET WriteAllText with the default UTF8 encoding. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
@SamMorrowDrums heya, this is a very small change, any chance we could get it signed off and merged? |
jluocsa
left a comment
There was a problem hiding this comment.
Really like this — the test names are clear, the edge-case coverage (sub-second, zero time, past time, wrapped errors via errors.As, missing RetryAfter) is the kind of thing that usually only gets backfilled after a bug report. Three small things worth raising:
1. Scope question: GraphQL rate limits. NewGitHubAPIErrorResponse is the REST path. GraphQL has its own rate-limit shape (errors with type: "RATE_LIMITED" in the response body, not a 403 / RateLimitError). If a GraphQL call gets rate-limited, it currently flows through a separate handler and won't get the same clean message. Is that intentional for this PR (REST-only scope, GraphQL follow-up), or worth a parallel branch in the GraphQL error helper? Either way is fine — just want to make sure it's not an oversight.
2. Long retry windows. For the primary rate limit (5000/hr for authenticated requests), a 403 right after exhausting your quota can produce Retry after 59m59s. That's accurate but agents often retry on monotonic clocks; if they pause-and-resume across a sleep/wake, the human-readable duration is now stale. Consider including the absolute reset timestamp:
GitHub API rate limit exceeded. Retry after 59m59s (resets at 2025-01-15T14:30:00Z).
Optional / polish — but useful for long-running agent sessions.
3. Round(time.Second) on a value that's already in seconds. rateLimitErr.Rate.Reset comes from the X-RateLimit-Reset header, which is an integer epoch second. time.Until(resetTime) will have at most sub-second drift from the clock skew between the call and Until. The .Round(time.Second) is fine and matches the formatting intent, just noting that it's defensive rather than truly needed.
The AbuseRateLimitError.RetryAfter is genuinely sub-second-capable (some clients set it from Retry-After: 0.5), so the rounding there does real work — and the "rounds-to-zero falls back to wait message" test case is a really nice catch.
The fact that the original error is still placed on context via addGitHubAPIErrorToContext before the rate-limit branch returns is the right call — middleware/observability still sees the typed error.
|
@jluocsa thanks for the feedback. I propose to merge this as-is as it's a strict improvement standing alone, I think, and it addresses what popped in the corpus of real world agent errors. I could potentially do a follow up, but don't plan it (I moved teams since) |
When the GitHub API returns a rate limit error, agents currently receive raw Go HTTP error strings:
These are hard for agents to reliably parse. Without a clear signal to stop and wait, agents may retry immediately — burning through quota faster — or misread the retry delay and either wait too short (triggering another error) or too long (wasting time).
This PR makes
NewGitHubAPIErrorResponseintercept*github.RateLimitErrorand*github.AbuseRateLimitErrorand return clean, actionable messages instead:Edge cases handled:
AbuseRateLimitError.RetryAfteris nil or zero → "Wait before retrying."errors.As)Backward compatibility: The original error is stored in context via
addGitHubAPIErrorToContextbefore the rate-limit check, so any middleware reading from context is unaffected — only the text returned to the MCP client changes.Fixes #2385.
Note
This PR description was drafted with GitHub Copilot.