-
Notifications
You must be signed in to change notification settings - Fork 62
Faster upload token refresh on 5xx errors during uploads #548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
When a server returns a 5xx error (e.g., 503 Service Unavailable) during file upload, the SDK now properly refreshes upload tokens instead of retrying with stale cached upload URLs a bunch of times first. Previously, the HTTP layer would catch ServiceError and retry using the same pre-fetched upload URLs from the token pool. This caused repeated failures because the upload URLs may no longer be valid after a 5xx error and it delayed the recovery. The fix introduces ServiceErrorDuringUpload exception that disables HTTP-level retries (similar to B2RequestTimeoutDuringUpload), forcing the error to propagate to the upload manager. The upload manager then clears cached tokens via clear_bucket_upload_data() and requests fresh upload URLs for retry attempts. Changes: - Added ServiceErrorDuringUpload exception class - Convert ServiceError to ServiceErrorDuringUpload in b2http.py - Fixed RawSimulator to generate unique upload tokens per URL - Added comprehensive test for token refresh on 503 errors Fixes Backblaze/B2_Command_Line_Tool#1118
| # Convert ServiceError to ServiceErrorDuringUpload for upload operations. | ||
| # This disables HTTP-level retries and forces the upload manager to clear | ||
| # cached upload tokens and request fresh ones. | ||
| raise ServiceErrorDuringUpload(str(e)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method post_content_return_json is being called by _post_json_return_json, and the latter is being used by almost all post requests.
I am afraid this will break retries for all POST requests, not just ones related to uploads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that would be terrible. Please make sure we have tests for this (I assumed that we do and that the fact that the tests pass after the fix means it's ok, but maybe not?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I've redesigned b2http tests in another PR and checked the logic here once again.
The code doesn't actually break the retrying part as per my concern above, but, unfortunately, it doesn't quite work either.
The problem is that the retrying logic happens in the _translate_and_retry block, and the exception is being re-raised later in the post_content_return_json block, once all the retries are finished. That also explains why the retrying part didn't break per se.
So we need a different approach, perhaps with introduction of a new keyword argument.
The test here probably also needs to be rewritten (or modified), cause it didn't quite reveal the problem - the refresh happened after losing time retrying with the old token.
Will see what I can do here.
When a server returns a 5xx error (e.g., 503 Service Unavailable) during file upload, the SDK now properly refreshes upload tokens instead of retrying with stale cached upload URLs a bunch of times first.
Previously, the HTTP layer would catch ServiceError and retry using the same pre-fetched upload URLs from the token pool. This caused repeated failures because the upload URLs may no longer be valid after a 5xx error and it delayed the recovery.
The fix introduces ServiceErrorDuringUpload exception that disables HTTP-level retries (similar to B2RequestTimeoutDuringUpload), forcing the error to propagate to the upload manager. The upload manager then clears cached tokens via clear_bucket_upload_data() and requests fresh upload URLs for retry attempts.
Changes:
Fixes Backblaze/B2_Command_Line_Tool#1118