There is a TODO in scalar_multiplication.cpp questioning whether the single-threaded bucket accumulation loop is a performance concern for large MSMs with many threads:
https://github.com/AztecProtocol/aztec-packages/blob/next/barretenberg/cpp/src/barretenberg/ecc/scalar_multiplication/scalar_multiplication.cpp#L478-L486
// Accumulate results. This part needs to be single threaded, but amount of work done here should be small
// TODO(@zac-williamson) check this? E.g. if we are doing a 2^16 MSM with 256 threads this single-threaded part
// will be painful.
Action items:
- Benchmark the bucket accumulation phase relative to the overall MSM for representative sizes (2^16, 2^20) to determine if this is actually a bottleneck.
- If it is, explore parallelization options. Note that
BitVector::set() in bitvector.hpp performs a non-atomic read-modify-write, so any parallelization of the bucket loop must keep BucketAccumulators per-thread or use atomic operations.
There is a TODO in
scalar_multiplication.cppquestioning whether the single-threaded bucket accumulation loop is a performance concern for large MSMs with many threads:https://github.com/AztecProtocol/aztec-packages/blob/next/barretenberg/cpp/src/barretenberg/ecc/scalar_multiplication/scalar_multiplication.cpp#L478-L486
Action items:
BitVector::set()inbitvector.hppperforms a non-atomic read-modify-write, so any parallelization of the bucket loop must keepBucketAccumulatorsper-thread or use atomic operations.