Cache-Aware Block-Transposed Chamfer/MaxSim Distance for f32 by suri-kumkaran · Pull Request #863 · microsoft/DiskANN

suri-kumkaran · 2026-03-25T18:50:12Z

What

A SIMD-accelerated MaxSim and Chamfer distance implementation for f32 multi-vector queries using block-transposed memory layout with L2/L1 cache-aware tiling.

New module: diskann-quantization/src/multi_vector/distance/cache_aware/

mod.rs — CacheAwareKernel unsafe trait + cache budget constants
kernel.rs — generic 5-level tiling loop (tiled_reduce<K>)
f32_kernel.rs — f32 SIMD micro-kernel (16×4 FMA), QueryBlockTransposedRef wrapper, MaxSim/Chamfer trait impls, tests

Supporting changes:

block_transposed.rs — added available_rows() and From<MatRef<Standard<T>>>
distance/mod.rs, multi_vector/mod.rs — module wiring and re-exports
distance/simple.rs — disambiguated a test call

Why

The existing simple kernel iterates query×doc in a flat nested loop, causing repeated cache evictions on large multi-vector workloads. By block-transposing the query and tiling both sides to fit in L2/L1, the new kernel keeps hot data resident and feeds the FMA pipeline more efficiently.

Design Decisions

Reducing-GEMM pattern: L2 tiles for query panels (~625 KB), L1 tiles for document panels (~36 KB). A 16×4 micro-kernel (2×f32x8 × 4 broadcast unrolls) processes each tile intersection.
Generic kernel trait: CacheAwareKernel abstracts the micro-kernel so future element types (f16, i8/u8) reuse the tiling loop without duplication.
QueryBlockTransposedRef newtype: Prevents query/document argument swapping at compile time, analogous to the existing QueryMatRef.

codecov-commenter · 2026-03-25T19:16:46Z

Codecov Report

❌ Patch coverage is 91.02990% with 27 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.30%. Comparing base (7e3750c) to head (0a70420).
⚠️ Report is 2 commits behind head on main.

Files with missing lines	Patch %	Lines
...on/src/multi_vector/distance/cache_aware/kernel.rs	85.85%	14 Missing ⚠️
...rc/multi_vector/distance/cache_aware/f32_kernel.rs	93.01%	13 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #863      +/-   ##
==========================================
- Coverage   89.33%   89.30%   -0.04%     
==========================================
  Files         443      444       +1     
  Lines       83488    83548      +60     
==========================================
+ Hits        74587    74614      +27     
- Misses       8901     8934      +33

Flag	Coverage Δ
miri	`89.30% <91.02%> (-0.04%)`	⬇️
unittests	`89.14% <91.02%> (-0.04%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
...-quantization/src/multi_vector/block_transposed.rs	`96.99% <100.00%> (+0.04%)`	⬆️
...n-quantization/src/multi_vector/distance/simple.rs	`98.43% <100.00%> (ø)`
...rc/multi_vector/distance/cache_aware/f32_kernel.rs	`93.01% <93.01%> (ø)`
...on/src/multi_vector/distance/cache_aware/kernel.rs	`85.85% <85.85%> (ø)`

... and 9 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Add Cache aware multi-vector distance functions

0a70420

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache-Aware Block-Transposed Chamfer/MaxSim Distance for f32#863

Cache-Aware Block-Transposed Chamfer/MaxSim Distance for f32#863
suri-kumkaran wants to merge 1 commit intomainfrom
users/suryangupta/multi-vector-distance-impl

suri-kumkaran commented Mar 25, 2026

Uh oh!

codecov-commenter commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

suri-kumkaran commented Mar 25, 2026

What

Why

Design Decisions

Uh oh!

codecov-commenter commented Mar 25, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants