Skip to content

test(consensus): reproduce fast_forward_sync num_blocks u64 underflow (audit #705)#742

Open
keanji-x wants to merge 1 commit into
mainfrom
test/unit-ffsync-numblocks-underflow-705
Open

test(consensus): reproduce fast_forward_sync num_blocks u64 underflow (audit #705)#742
keanji-x wants to merge 1 commit into
mainfrom
test/unit-ffsync-numblocks-underflow-705

Conversation

@keanji-x

Copy link
Copy Markdown
Contributor

Summary

Reproduction (test-only PR) for gravity-audit#705 — F3 in galxe/RESTART_RECOVERY_FINDINGS.md: a u64 underflow of num_blocks in BlockStore::fast_forward_sync.

The call site at aptos-core/consensus/src/block_storage/sync_manager.rs:509-514:

let num_blocks = highest_quorum_cert.certified_block().round()
    - highest_commit_cert.ledger_info().ledger_info().round()
    + 1;
// although unlikely, we might wrap num_blocks around on a 32-bit machine
assert!(num_blocks < std::usize::MAX as u64);

highest_commit_cert is effective_commit_cert = max(local_hcc, remote_hcc) by round (the gravity-specific change at :358). When the local commit root is ahead of the incoming SyncInfo's hqc certified round (a stale / forked SyncInfo whose hqc block the node does not have), hqc_round - commit_round + 1 underflows to ~u64::MAX. The guard only considered 32-bit overflow; on 64-bit usize::MAX == u64::MAX, so the wrapped value still satisfies < usize::MAX and the assert passes, then the absurd count is handed to retrieve_blocks_in_range(.., num_blocks, ..):

  • debug builds: panic with attempt to subtract with overflow
  • release builds: wraps and attempts a giant fetch → sync stall.

What this PR adds

Test code only — no production-code edits. A #[cfg(test)] mod ffsync_numblocks_underflow_705_tests appended to sync_manager.rs that builds the exact same QuorumCert / WrappedLedgerInfo objects the production path operates on (using gen_test_certificate + into_wrapped_ledger_info), with commit_round = 100 > hqc_round = 90, and evaluates the identical num_blocks expression.

  • ffsync_num_blocks_underflows_when_commit_ahead_of_hqcFAILS on current code (overflow panic at the production expression). Would PASS once a checked_sub + early-return guard (commit_round >= hqc_round) is added.
  • ffsync_num_blocks_ok_when_hqc_ahead_of_commit — passes; guards against an over-broad fix breaking the happy path.

Evidence

$ RUSTFLAGS="--cfg tokio_unstable" cargo test -p aptos-consensus ffsync_numblocks_underflow_705 -- --nocapture --test-threads=1

running 2 tests
test ...::ffsync_num_blocks_ok_when_hqc_ahead_of_commit ... ok
test ...::ffsync_num_blocks_underflows_when_commit_ahead_of_hqc ...
thread '...::ffsync_num_blocks_underflows_when_commit_ahead_of_hqc' panicked at
  aptos-core/consensus/src/block_storage/sync_manager.rs:1446:26:
attempt to subtract with overflow
FAILED

test result: FAILED. 1 passed; 1 failed; 0 ignored; 0 measured; 272 filtered out

sync_manager.rs:1446 is the test's verbatim copy of the production num_blocks expression (line 509). The crate's test profile has overflow checks enabled, so the underflow surfaces as a panic.

Suggested fix (not in this PR)

Use checked_sub with an early return when the node is already ≥ the target (commit round ≥ hqc round).

Refs Galxe/gravity-audit#705

🤖 Generated with Claude Code

…#705)

F3 of galxe/RESTART_RECOVERY_FINDINGS.md. `BlockStore::fast_forward_sync`
(aptos-core/consensus/src/block_storage/sync_manager.rs:509-514) computes:

    let num_blocks = highest_quorum_cert.certified_block().round()
        - highest_commit_cert.ledger_info().ledger_info().round() + 1;
    assert!(num_blocks < std::usize::MAX as u64);  // does NOT catch underflow on 64-bit

`highest_commit_cert` is `effective_commit_cert = max(local_hcc, remote_hcc)`
by round (sync_to_highest_quorum_cert, :358). When the local commit root is
AHEAD of the incoming SyncInfo's hqc certified round, `hqc_round -
commit_round + 1` underflows to ~u64::MAX. The guard only handled 32-bit
overflow; on 64-bit usize::MAX == u64::MAX so the wrapped value passes the
assert and an absurd count is handed to retrieve_blocks_in_range (debug:
panic; release: wrap + giant fetch -> sync stall).

Adds a #[cfg(test)] module (test code only, no production-code edits) that
builds the exact same QuorumCert / WrappedLedgerInfo objects with
commit_round (100) > hqc_round (90) and evaluates the identical num_blocks
expression.

Evidence (cargo test -p aptos-consensus, profile has overflow checks on):

  test ...::ffsync_num_blocks_underflows_when_commit_ahead_of_hqc ...
  thread '...' panicked at sync_manager.rs:1446:26:
  attempt to subtract with overflow
  FAILED
  test ...::ffsync_num_blocks_ok_when_hqc_ahead_of_commit ... ok

The underflow test FAILS on current code (overflow panic at the production
expression); it would PASS once a checked_sub + early-return guard
(commit_round >= hqc_round) is added. The happy-path test guards against an
over-broad fix.

Refs Galxe/gravity-audit#705

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant