fix(nonce-init): retry on RPC errors, not just on tokio timeouts#47
Merged
Conversation
3 tasks
The previous logic was
let res = tokio::time::timeout(..., client.get_pending_txn_count(addr)).await;
if res.is_ok() { init_nonce = res.ok(); break; }
`res` is `Result<Result<u64, _>, Elapsed>`, so `res.is_ok()` is true
whenever the timeout did not expire — including when the inner RPC call
returned `Err` (connection reset, request error, etc.). In that case
the loop broke after a single attempt and `init_nonce` was set to
`Some(Err(_))`, which then panicked the task with "Failed to get nonce
for address". The five-retry intent never took effect.
Match both layers so we actually retry on inner RPC errors and on
outer timeouts, and add a small backoff between attempts. Behavior on
success is unchanged; behavior on persistent failure is the same panic
after exhausting all 5 attempts.
This was reproducible against a slow / overloaded RPC endpoint with
~100k accounts in --recover mode: a single dropped connection during
nonce initialization aborted the whole run.
dc8a4c1 to
dd4308f
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
init_nonceonly checked the tokio timeout outerResult, not the inner RPCResult. Inner RPC errors (connection drops etc.) still satisfiedres.is_ok(), breaking the loop on the first attempt and panicking the task.Why this matters
Reproducible against a slow / overloaded RPC endpoint with ~100k accounts in
--recovermode: a single dropped connection during nonce initialization aborted the whole run. After this fix, the same scenario completes nonce init in ~4-5 minutes with one or two warning logs as transient errors get retried instead of being fatal.Behavior changes
Test plan
cargo checkpasses.🤖 Generated with Claude Code