Skip to content

chore: avoid hammering store at ntx builder startup#2064

Open
SantiagoPittella wants to merge 5 commits into
mainfrom
santiagopittella-avoid-ntx-builder-spam-main
Open

chore: avoid hammering store at ntx builder startup#2064
SantiagoPittella wants to merge 5 commits into
mainfrom
santiagopittella-avoid-ntx-builder-spam-main

Conversation

@SantiagoPittella
Copy link
Copy Markdown
Collaborator

@SantiagoPittella SantiagoPittella commented May 8, 2026

closes #1952

This PR:

  • Trusts local committed account state on restart and only fetches the unconsumed_notes delta per known account, dropping the per-account GetNetworkAccountDetailsById call.
  • Adds a configurable hydration semaphore in StoreClient (defaulted to 4) that bounds concurrent in-flight startup RPCs.
  • Adds a separate store_sync_checkpoint column on chain_state (required a new SQL migration), distinct from the mempool-driven block_num.

I tried it by:

  1. initializing the main version of the node.
  2. Started multiple network monitors to create network accounts.
  3. Stoped everything.
  4. Installed and initialized the new version with the existing data (i.e. I didn't re-run bootstrap), which ran the migration
  5. Everything continue working as expected.

@SantiagoPittella SantiagoPittella changed the title chore: avoid hamming store at ntx builder startup chore: avoid hammering store at ntx builder startup May 8, 2026
Comment thread crates/ntx-builder/src/clients/store.rs Outdated
Comment thread crates/ntx-builder/src/db/models/queries/mod.rs Outdated
Comment thread crates/ntx-builder/src/db/mod.rs Outdated
Comment thread crates/ntx-builder/src/builder.rs Outdated
/// Bounds the burst of `GetNetworkAccountDetailsById` and `GetUnconsumedNetworkNotes` calls the
/// ntx-builder fires when catching up after a restart.
const DEFAULT_STORE_HYDRATION_CONCURRENCY: NonZeroUsize =
NonZeroUsize::new(4).expect("literal is non-zero");
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know how slow things could be because of this?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, because my local DB isn't big enough to actually measure this. It would be really nice to have a copy of the DB in a server with similar specs to the devnet/testnet ones to compare it. Or at least running it locally with the testnet database.

Do you know if I can get access to it? maybe @Mirko-von-Leipzig ?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kkovaacs got access recently IIRC

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do have a 0.13 testnet database snapshot. I can upload it somewhere for you if that helps (but since it's for 0.13 I'm not sure it does...)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, probably wont be of much use

Copy link
Copy Markdown
Collaborator

@sergerad sergerad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but I think we need to make DEFAULT_STORE_HYDRATION_CONCURRENCY configurable even if we do test performance locally. Feels like the type of thing that might need changing in the live environment.

Comment on lines +1 to +4
-- Store-sync checkpoint: the highest block whose state has been fully ingested into the local
-- DB by the startup catch-up sync.
--
-- NULL means "never sync'd": treated as a full GENESIS-onward sync on first boot after upgrade.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we avoid this NULL by reframing this to the next block to sync, rather than the latest sync'd block?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm yes, that should work

Comment on lines +110 to +112
/// ```sql
/// SELECT account_id FROM accounts WHERE transaction_id IS NULL
/// ```
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to become a very large list one day.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we paginate it?

Comment on lines +41 to +42
block_num = excluded.block_num, \
block_header = excluded.block_header",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does excluded come from?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It basically holds the row that the INSERT tried to add. Probably we can simplify it by doing something like:

  INSERT INTO chain_state (id, block_num, block_header)
  VALUES (0, ?1, ?2)
  ON CONFLICT(id) DO UPDATE SET
      block_num = ?1,
      block_header = ?2

Comment on lines +166 to +170
// For each account that had an inflight row at startup, do a full hydration
// (account-details + unconsumed-notes) from the store. The inflight tx may have
// landed in a block we didn't witnessed, so locally-committed state cannot be
// trusted for these specific accounts. The set is bounded by `max_concurrent_txs`
// so this is small and we run it sequentially before opening the main loop.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh this is neat; I hadn't actually considered this. Nice!

Comment on lines +215 to +224
let kickoff_block = if catch_up_started {
None
} else {
match &event {
MempoolEvent::BlockCommitted { header, .. } => {
Some(header.block_num())
},
_ => None,
}
};
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can simplify this by first waiting for the first block committed event outside of this main loop. Buffer the processed items into a Vec until then.

Once found, kickoff the catch up stuff, and then enter this main loop but chain the vec + mempool stream:

let mut events = futures::stream::iter(buffered_events).chain(mempool_events);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this works. But I must admit this more complex than I anticipated (not an impl fault; overlaps + concurrency is a pain).

I'm wondering if this is worth it. What are some alternate options?

We could revert to always requiring a full sync before continuing on.. this is a bit wasteful and means we cannot immediately begin producing network txs, but does simplify this to a: resync from last_sync..=tip and only then begin producing txs.. This also simplifies the code quite a bit.

I'm a bit on the fence..

wdyt?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mostly agree, if we can get at testnet snapshot we can start checking (though probably not enough) if this complexity is worth having or not

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants