fix(bench): connection leak + faucet U256 underflow#50
Merged
Conversation
Two production bugs uncovered during a long-running 1500-sender bench against a single reth RPC endpoint. eth_cli.rs — connection leak under high load - pool_idle_timeout was 10s, well below reqwest's 90s default. Short idle eviction churned sockets faster than they were reused, opening a fresh TCP connection for nearly every RPC call. With 1500 senders this drove ~20K ESTABLISHED sockets to a single RPC port and tripped the target reth's rpc.max-connections=20000 cap, wedging the server. Restored to 90s and added a connection-aware reqwest .timeout(10s) on the ClientBuilder. - The three hot RPCs (get_pending_txn_count, send_raw_tx, get_latest_txn_count) were wrapped in tokio::time::timeout(10s, ...). When that fires it drops the future mid-response; reqwest sees the abort and discards the socket rather than returning it to the pool, which leaks connections under any tail latency. Replaced with reqwest's built-in per-request timeout (set on the ClientBuilder above), which is connection-aware and recycles sockets correctly. - Measured impact: 20K -> 2K ESTABLISHED connections in long-run bench. main.rs — clamp faucet ETH to on-chain reality - start_bench now queries eth_getBalance(faucet) and computes effective_faucet_eth = on_chain - 1% headroom when the configured fauce_eth_balance is smaller than the on-chain amount, and clamps down when the config exceeds on-chain. This prevents the cascade builder from receiving a value that can't cover total_cost, which would silently U256-underflow downstream. - init_nonce: per-RPC timeout 10s -> 60s and buffer_unordered 1024 -> 16 to stop hammering the RPC endpoint during the nonce-init phase. faucet.rs — assert before U256 subtraction - Added assert!(faucet_balance >= total_cost, ...) before the faucet_balance - total_cost subtraction inside FaucetTreePlanBuilder::new. U256 wraps silently on underflow, producing per-child funding ~2^256/N that permanently strips ENOUGH_BALANCE in the txpool and deadlocks the cascade. Fail fast with a diagnostic instead. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- main.rs: rewrite self-contradicting comment block on effective_faucet_eth selection (clamp DOWN vs scale UP semantics) in plain English - faucet.rs: add symmetric faucet_balance >= total_gas_cost assert to the single-level (total_levels == 1) branch so it surfaces the same diagnostic as the multi-level path before U256::sub silently underflows - main.rs: document the buffer_unordered(16) concurrency choice and the 60s per-fetch timeout in init_nonce - main.rs: note that the faucet-balance .expect() is wrapped in retry_with_backoff (3x with 5s/10s/10s, ~25s wall before panic) - main.rs: drop the two redundant warn! lines on the cascade-sizing branches; the single info! "Faucet ETH plan: configured=.. on_chain=.. effective=.." already carries every field, and the warn! for the "configured exceeds on-chain" anomaly is retained as it signals operator misconfig - main.rs: add inline note that on_chain == 0 falls through to the cascade builder's assert (rather than U256-underflowing inside this fn) No behavior change. cargo check clean, no new warnings.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two production bugs surfaced during a 1500-sender long-run bench against a single reth RPC.
Bug 1 — 20K-connection leak (
src/eth/eth_cli.rs)Symptom: Target reth's
rpc.max-connections=20000cap is exhausted within minutes, wedging the RPC server.ss -t state establishedon the bench host shows ~20K sockets to one peer.Root causes:
pool_idle_timeout(Duration::from_secs(10))— too aggressive. reqwest's default is 90s. With 10s, sockets get evicted from the pool faster than 1500 concurrent senders can reuse them, so each request opens a new TCP connection.get_pending_txn_count,send_raw_tx,get_latest_txn_count) was wrapped intokio::time::timeout(Duration::from_secs(10), ...). When that fires, the future is dropped mid-response; reqwest treats that as an aborted request and discards the socket instead of returning it to the pool. Under any tail latency this leaks connections fast.Fix:
pool_idle_timeout→ 90s (reqwest default).timeout(Duration::from_secs(10))to thereqwest::ClientBuilderchain. reqwest's built-in timeout is connection-aware and recycles the socket back to the pool when it fires.tokio::time::timeout(...)wraps at the call sites.Measured impact: 20K → 2K ESTABLISHED connections in the long-run bench.
Bug 2 — U256 underflow in faucet cascade (
src/main.rs,src/txn_plan/constructor/faucet.rs)Scenario: If
fauce_eth_balance(config) is larger than the actual on-chain faucet balance, orgas_budget × total_txnsexceedsfauce_eth_balance, then insideFaucetTreePlanBuilder::new:The U256 subtraction wraps to ~2^256, producing per-child funding values that permanently strip
ENOUGH_BALANCEchecks in the reth txpool — the cascade silently deadlocks with no progress and no error.Fixes:
src/main.rs: queryeth_getBalance(faucet)at startup and computeeffective_faucet_eth = min(configured, on_chain) − 1% headroom(or clamp down when config exceeds on-chain). Pass this tocreate_faucet_tree_plan_builderinstead of the raw config value.src/txn_plan/constructor/faucet.rs:assert!(faucet_balance >= total_cost, ...)before the subtraction, with a diagnostic that names every input.Drive-by — nonce-init tuning (
src/main.rs)buffer_unordered: 1024 → 16 (don't hammer the RPC during init)Test plan
cargo checkclean (no warnings)fauce_eth_balanceis misconfigured — assertion now fires with a readable diagnostic