Skip to content

feat: Karatsuba Montgomery multiplication for WASM (7-10% speedup)#22481

Open
ElusAegis wants to merge 1 commit intoAztecProtocol:nextfrom
ElusAegis:feat/karatsuba-mont-mul
Open

feat: Karatsuba Montgomery multiplication for WASM (7-10% speedup)#22481
ElusAegis wants to merge 1 commit intoAztecProtocol:nextfrom
ElusAegis:feat/karatsuba-mont-mul

Conversation

@ElusAegis
Copy link
Copy Markdown

Summary

Replace the 9×9 schoolbook Montgomery multiplication in the WASM code path (montgomery_mul in field_impl_generic.hpp) with a Karatsuba decomposition. The 9 limbs (29-bit) are split into a 5+4 structure:

  • P_lo = left[0..4] × right[0..4] — 5×5 schoolbook (25 muls)
  • P_hi = left[5..8] × right[5..8] — 4×4 schoolbook (16 muls)
  • P_cross = (left_lo + left_hi) × (right_lo + right_hi) — 5×5 schoolbook (25 muls)
  • P_mid = P_cross − P_lo − P_hi

This reduces the total scalar multiplications from 81 to 66 (−18.5%). The reduction chain (wasm_reduce_yuval ×8 + wasm_reduce ×1), carry propagation, and limb conversion are all unchanged.

The unsigned subtractions in the combine step are safe: each P_cross[k] expands to P_lo[k] + P_hi[k] + cross_terms, so P_cross[k] ≥ P_lo[k] + P_hi[k] always holds.

What this PR changes

One file: barretenberg/cpp/src/barretenberg/ecc/fields/field_impl_generic.hpp

Only the #else (WASM) branch of montgomery_mul is modified. The multiplication section is replaced with the Karatsuba decomposition; everything else (reduction, carry propagation, limb conversion, all comments) is preserved exactly.

Benchmark results

Tested on Apple M1 Pro. Both variants compiled to WASM using the wasm-threads preset (Release, -O3) and executed via wasmtime with --wasm relaxed-simd -W threads=y -S threads=y. Results are averaged over 10 runs.

Benchmark 1 — Montgomery Multiplication

Microbenchmark of Fq and Fr field multiplication. Latency measures sequential dependent multiplications (a *= b in a chain). Throughput measures 64 independent multiplications per iteration.

Benchmark Schoolbook avg ± σ Karatsuba avg ± σ Delta
Fq Latency 48.35 ± 0.57 ns 44.91 ± 0.36 ns −7.1%
Fq Throughput (×64) 3057.43 ± 16.90 ns 2793.36 ± 16.80 ns −8.6%
Fr Latency 48.14 ± 0.37 ns 44.62 ± 0.46 ns −7.3%
Fr Throughput (×64) 3081.52 ± 16.76 ns 2778.81 ± 19.00 ns −9.8%

Benchmark 2 — Multi-Scalar Multiplication (Pippenger)

MSM using PippengerUnsafe at various point counts (5 repetitions each).

Points Baseline (mean) Karatsuba (mean) Delta
16,384 62.08 ms 57.93 ms −6.7%
65,536 213.14 ms 195.98 ms −8.1%
262,144 698.05 ms 652.51 ms −6.5%
1,048,576 2380.82 ms 2225.27 ms −6.5%

Benchmark 3 — Full Proving (Chonk, 5 repetitions)

Benchmark Baseline (mean) Karatsuba (mean) Delta
ChonkBench/Full/2 26,912 ms 25,180 ms −6.4%

Test plan

  • Montmul microbenchmark shows 7-10% improvement (10-run average)
  • MSM benchmark shows 6-8% improvement across all point counts
  • Full proving benchmark shows 6.4% improvement
  • Existing field arithmetic tests (prime_field_tests, ecc_tests) pass on CI

@ElusAegis
Copy link
Copy Markdown
Author

Note on montgomery_square: The WASM squaring path already exploits symmetry (upper triangle + diagonal), using 45 muls vs 81 for schoolbook multiplication. Applying the same Karatsuba split would bring it to ~40 muls, but saving 5 muls at the cost of ~20 extra add/sub operations is unlikely to yield a measurable improvement.

@ludamad ludamad added ci-external-once Run CI on an external PR, but only once. and removed ci-external-once Run CI on an external PR, but only once. labels Apr 10, 2026
@github-actions github-actions bot removed the ci-external-once Run CI on an external PR, but only once. label Apr 10, 2026
@ludamad
Copy link
Copy Markdown
Collaborator

ludamad commented Apr 10, 2026

Thanks, tentatively exciting. Will get a review here once CI passes!

…ation

Splits the 9-limb (29-bit) multiply into a 5+4 Karatsuba decomposition,
reducing scalar multiplications from 81 to 66 (~18%).

Key changes:
- Replace schoolbook wasm_madd loop with Karatsuba (P_lo + P_hi + P_cross)
- Reduction, carry propagation, and conversion are unchanged

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ElusAegis ElusAegis force-pushed the feat/karatsuba-mont-mul branch from 6b9575f to 89ee83c Compare April 11, 2026 09:36
@ElusAegis
Copy link
Copy Markdown
Author

@ludamad should be ready for the CI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants