chore: avoid hammering store at ntx builder startup by SantiagoPittella · Pull Request #2064 · 0xMiden/node

SantiagoPittella · 2026-05-08T15:53:18Z

closes #1952

This PR:

Trusts local committed account state on restart and only fetches the unconsumed_notes delta per known account, dropping the per-account GetNetworkAccountDetailsById call.
Adds a configurable hydration semaphore in StoreClient (defaulted to 4) that bounds concurrent in-flight startup RPCs.
Adds a separate store_sync_checkpoint column on chain_state (required a new SQL migration), distinct from the mempool-driven block_num.

I tried it by:

initializing the main version of the node.
Started multiple network monitors to create network accounts.
Stoped everything.
Installed and initialized the new version with the existing data (i.e. I didn't re-run bootstrap), which ran the migration
Everything continue working as expected.

sergerad · 2026-05-11T00:40:57Z

+/// Bounds the burst of `GetNetworkAccountDetailsById` and `GetUnconsumedNetworkNotes` calls the
+/// ntx-builder fires when catching up after a restart.
+const DEFAULT_STORE_HYDRATION_CONCURRENCY: NonZeroUsize =
+    NonZeroUsize::new(4).expect("literal is non-zero");


Do we know how slow things could be because of this?

Not really, because my local DB isn't big enough to actually measure this. It would be really nice to have a copy of the DB in a server with similar specs to the devnet/testnet ones to compare it. Or at least running it locally with the testnet database.

Do you know if I can get access to it? maybe @Mirko-von-Leipzig ?

@kkovaacs got access recently IIRC

I do have a 0.13 testnet database snapshot. I can upload it somewhere for you if that helps (but since it's for 0.13 I'm not sure it does...)

yeah, probably wont be of much use

sergerad

LGTM but I think we need to make DEFAULT_STORE_HYDRATION_CONCURRENCY configurable even if we do test performance locally. Feels like the type of thing that might need changing in the live environment.

Mirko-von-Leipzig · 2026-05-12T08:19:47Z

+-- Store-sync checkpoint: the highest block whose state has been fully ingested into the local
+-- DB by the startup catch-up sync.
+--
+-- NULL means "never sync'd": treated as a full GENESIS-onward sync on first boot after upgrade.


Could we avoid this NULL by reframing this to the next block to sync, rather than the latest sync'd block?

hmm yes, that should work

Mirko-von-Leipzig · 2026-05-12T08:22:00Z

+/// ```sql
+/// SELECT account_id FROM accounts WHERE transaction_id IS NULL
+/// ```


This is going to become a very large list one day.

should we paginate it?

Mirko-von-Leipzig · 2026-05-12T08:32:34Z

+             block_num = excluded.block_num, \
+             block_header = excluded.block_header",


Where does excluded come from?

It basically holds the row that the INSERT tried to add. Probably we can simplify it by doing something like:

INSERT INTO chain_state (id, block_num, block_header) VALUES (0, ?1, ?2) ON CONFLICT(id) DO UPDATE SET block_num = ?1, block_header = ?2

Mirko-von-Leipzig · 2026-05-12T08:52:46Z

+        // For each account that had an inflight row at startup, do a full hydration
+        // (account-details + unconsumed-notes) from the store. The inflight tx may have
+        // landed in a block we didn't witnessed, so locally-committed state cannot be
+        // trusted for these specific accounts. The set is bounded by `max_concurrent_txs`
+        // so this is small and we run it sequentially before opening the main loop.


Oh this is neat; I hadn't actually considered this. Nice!

Mirko-von-Leipzig · 2026-05-12T12:45:56Z

+                    let kickoff_block = if catch_up_started {
+                        None
+                    } else {
+                        match &event {
+                            MempoolEvent::BlockCommitted { header, .. } => {
+                                Some(header.block_num())
+                            },
+                            _ => None,
+                        }
+                    };


We can simplify this by first waiting for the first block committed event outside of this main loop. Buffer the processed items into a Vec until then.

Once found, kickoff the catch up stuff, and then enter this main loop but chain the vec + mempool stream:

let mut events = futures::stream::iter(buffered_events).chain(mempool_events);

Mirko-von-Leipzig · 2026-05-12T12:46:10Z

I think this works. But I must admit this more complex than I anticipated (not an impl fault; overlaps + concurrency is a pain).

I'm wondering if this is worth it. What are some alternate options?

We could revert to always requiring a full sync before continuing on.. this is a bit wasteful and means we cannot immediately begin producing network txs, but does simplify this to a: resync from last_sync..=tip and only then begin producing txs.. This also simplifies the code quite a bit.

I'm a bit on the fence..

wdyt?

I mostly agree, if we can get at testnet snapshot we can start checking (though probably not enough) if this complexity is worth having or not

SantiagoPittella added 2 commits May 8, 2026 12:46

chore: avoid hamming store at ntx builder startup

95c238d

update Cargo.toml version + add changelog entry

46a1ec0

SantiagoPittella requested review from Mirko-von-Leipzig and sergerad May 8, 2026 15:53

SantiagoPittella changed the title ~~chore: avoid hamming store at ntx builder startup~~ chore: avoid hammering store at ntx builder startup May 8, 2026

chore: fix lint

ae97733