Skip to content

Cache-Aware Block-Transposed Chamfer/MaxSim Distance for f32#863

Draft
suri-kumkaran wants to merge 1 commit intomainfrom
users/suryangupta/multi-vector-distance-impl
Draft

Cache-Aware Block-Transposed Chamfer/MaxSim Distance for f32#863
suri-kumkaran wants to merge 1 commit intomainfrom
users/suryangupta/multi-vector-distance-impl

Conversation

@suri-kumkaran
Copy link
Contributor

What

A SIMD-accelerated MaxSim and Chamfer distance implementation for f32 multi-vector queries using block-transposed memory layout with L2/L1 cache-aware tiling.

New module: diskann-quantization/src/multi_vector/distance/cache_aware/

  • mod.rsCacheAwareKernel unsafe trait + cache budget constants
  • kernel.rs — generic 5-level tiling loop (tiled_reduce<K>)
  • f32_kernel.rs — f32 SIMD micro-kernel (16×4 FMA), QueryBlockTransposedRef wrapper, MaxSim/Chamfer trait impls, tests

Supporting changes:

  • block_transposed.rs — added available_rows() and From<MatRef<Standard<T>>>
  • distance/mod.rs, multi_vector/mod.rs — module wiring and re-exports
  • distance/simple.rs — disambiguated a test call

Why

The existing simple kernel iterates query×doc in a flat nested loop, causing repeated cache evictions on large multi-vector workloads. By block-transposing the query and tiling both sides to fit in L2/L1, the new kernel keeps hot data resident and feeds the FMA pipeline more efficiently.

Design Decisions

  • Reducing-GEMM pattern: L2 tiles for query panels (~625 KB), L1 tiles for document panels (~36 KB). A 16×4 micro-kernel (2×f32x8 × 4 broadcast unrolls) processes each tile intersection.
  • Generic kernel trait: CacheAwareKernel abstracts the micro-kernel so future element types (f16, i8/u8) reuse the tiling loop without duplication.
  • QueryBlockTransposedRef newtype: Prevents query/document argument swapping at compile time, analogous to the existing QueryMatRef.

@codecov-commenter
Copy link

Codecov Report

❌ Patch coverage is 91.02990% with 27 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.30%. Comparing base (7e3750c) to head (0a70420).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
...on/src/multi_vector/distance/cache_aware/kernel.rs 85.85% 14 Missing ⚠️
...rc/multi_vector/distance/cache_aware/f32_kernel.rs 93.01% 13 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #863      +/-   ##
==========================================
- Coverage   89.33%   89.30%   -0.04%     
==========================================
  Files         443      444       +1     
  Lines       83488    83548      +60     
==========================================
+ Hits        74587    74614      +27     
- Misses       8901     8934      +33     
Flag Coverage Δ
miri 89.30% <91.02%> (-0.04%) ⬇️
unittests 89.14% <91.02%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...-quantization/src/multi_vector/block_transposed.rs 96.99% <100.00%> (+0.04%) ⬆️
...n-quantization/src/multi_vector/distance/simple.rs 98.43% <100.00%> (ø)
...rc/multi_vector/distance/cache_aware/f32_kernel.rs 93.01% <93.01%> (ø)
...on/src/multi_vector/distance/cache_aware/kernel.rs 85.85% <85.85%> (ø)

... and 9 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants