perf(vector/hnsw): fold the .hnsw checksum into the Eager load pass (#789)#811
Merged
Conversation
…789) The #786 footer verification ran as a separate full read before the structural parse, ~doubling read I/O on every Eager reader creation. Fold the CRC into the single Eager structural pass via a new ChecksumTrackingInput wrapper and verify the footer after parse; the Lazy/OnDemand path keeps a dedicated up-front pass and legacy footer-less segments skip verification. A read-counting test asserts the Eager load now reads the segment exactly once (file_size bytes) instead of ~2x content_len. Closes #789
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Removes the redundant full read of
.hnswsegments on the Eager load path while keeping the #786 corruption guarantee.#786verified the CRC-32 footer withverify_checksum_footer— an independent full pass over the content before the structural parse. In Eager mode the parse then reads the same content again, so a footer-carrying segment was read ~twice on every reader creation (every searcher-cache miss, i.e. after eachcommit()).This PR folds the CRC into the single Eager structural pass and verifies the footer after parse — no extra read. The Lazy/OnDemand path (which seeks over the vector payload) keeps a dedicated up-front verification pass; legacy footer-less segments skip verification.
Changes
storage/checksum.rs— newChecksumTrackingInput, aStorageInputwrapper that accumulates a CRC over sequential reads. A real seek clearsis_sequential; the no-opCurrent(0)seek used bystream_position()is served from a byte counter so it does not break tracking;track=falsedegrades it to a thin position-tracking pass-through;absorb_to(len)covers residual bytes;clone_input()returns the unwrapped inner (OnDemand clones inherit no running-CRC state).vector/index/hnsw/reader.rs—verify_checksum_footersplit intoread_footer_crc(8-byte footer probe only) andverify_footer_content(Lazy dedicated pass).load()setsfold = Eager && footer_present, runs the structural parse throughChecksumTrackingInput, thenabsorb_to(content_len)+ compares against the stored CRC, falling back toverify_footer_contentifis_sequential()is unexpectedly false.Tests
ChecksumTrackingInputunit tests (sequential CRC,stream_positionkeeps sequential, real seek clears it,absorb_to,track=false,clone_inputunwraps inner).eager_load_reads_hnsw_segment_exactly_once): aCountingStorageasserts the Eager load reads the segment exactlyfile_sizebytes (footer probe + one folded pass). The pre-perf(vector/hnsw): avoid the extra full read when verifying the .hnsw checksum on load #789 double-read was ~2 * content_len + 8, so this is a deterministic regression guard for the latency-restored criterion.Verification
cargo test -p laurus --test vector_hnsw_checksum_test→ 5 passedcargo test -p laurus --lib checksum→ 9 passedcargo test -p laurus --lib vector::index::hnsw→ 37 passed (default) / 38 passed (--features pq-fastscan)cargo fmt --checkclean;cargo clippy -p laurus --all-targets -- -D warningsclean (default and--features pq-fastscan)cargo build -p laurus-wasm --target wasm32-unknown-unknown→ successNotes
HnswIndexReader::loadsignature is unchanged → no language-binding impact.Closes #789