Skip to content

Add AArch64 ML-DSA Forward NTT HOL Light proof#993

Open
dkostic wants to merge 3 commits intomainfrom
aarch64-ntt-hol
Open

Add AArch64 ML-DSA Forward NTT HOL Light proof#993
dkostic wants to merge 3 commits intomainfrom
aarch64-ntt-hol

Conversation

@dkostic
Copy link

@dkostic dkostic commented Mar 10, 2026

Port the ML-DSA Forward NTT implementation and its HOL Light proof of correctness from s2n-bignum to mldsa-native. The proof verifies the AArch64 NEON NTT implementation at the object-code level, showing that the output coefficients are congruent to the forward NTT of the input modulo 8380417, with bounded output coefficients. A constant-time and memory safety proof is also included.

New files:

  • aarch64/mldsa/mldsa_ntt.S: Assembly derived from dev/aarch64_opt/src/ntt.S
  • aarch64/proofs/mldsa_ntt.ml: HOL Light proof (MLDSA_NTT_CORRECT, MLDSA_NTT_SUBROUTINE_CORRECT, and MLDSA_NTT_SUBROUTINE_SAFE theorems)
  • aarch64/proofs/mldsa_specs.ml: Self-contained ML-DSA specifications and congruence/bounds propagation infrastructure, ported from s2n-bignum's common/mlkem_mldsa.ml with only ARM ML-DSA relevant definitions
  • aarch64/proofs/subroutine_signatures.ml: ML-DSA subroutine signatures for the safety proof infrastructure

Modified files:

@dkostic dkostic requested a review from a team as a code owner March 10, 2026 04:13
@dkostic
Copy link
Author

dkostic commented Mar 10, 2026

addresses #919

@oqs-bot
Copy link
Contributor

oqs-bot commented Mar 10, 2026

CBMC Results (ML-DSA-44)

Full Results (177 proofs)
Proof Status Current Previous Change
**TOTAL** 1932s 1890s +2.2%
mld_attempt_signature_generation 223s 221s +1%
polyvecl_pointwise_acc_montgomery_c 192s 186s +3%
poly_pointwise_montgomery_c 145s 145s +0%
rej_uniform_native 141s 131s +8%
sign_verify_internal 125s 120s +4%
mld_invntt_layer 85s 84s +1%
mld_ct_memcmp 70s 71s -1%
mld_ntt_layer 53s 51s +4%
keccak_squeezeblocks_x4 41s 40s +2%
sign_signature_internal 34s 30s +13%
polyvec_matrix_expand 30s 26s +15%
fqmul 20s 16s +25%
poly_chknorm_c 19s 20s -5%
rej_uniform 19s 19s +0%
mld_compute_t0_t1_tr_from_sk_components 18s 15s +20%
poly_uniform_eta_4x 17s 18s -6%
polyeta_unpack 17s 16s +6%
polyt0_unpack 16s 13s +23%
rej_uniform_c 16s 12s +33%
keccakf1600x4_permute_native 14s 14s +0%
poly_uniform_4x 14s 15s -7%
poly_add 13s 10s +30%
polymat_permute_bitrev_to_custom 13s 15s -13%
polyvec_matrix_expand_serial 13s 13s +0%
polyvec_matrix_pointwise_montgomery 13s 12s +8%
polyveck_power2round 12s 12s +0%
polyz_unpack_c 12s 10s +20%
mld_ntt_butterfly_block 11s 13s -15%
keccakf1600_permute_native 9s 9s +0%
sign_pk_from_sk 9s 5s +80%
keccak_absorb 8s 7s +14%
keccak_absorb_once_x4 8s 9s -11%
sign 8s 6s +33%
keccakf1600_permute 7s 7s +0%
mld_compute_pack_z 7s 7s +0%
mld_polyvecl_permute_bitrev_to_custom_native 7s 9s -22%
poly_invntt_tomont_c 7s 7s +0%
sign_keypair_internal 7s 5s +40%
mld_h 6s 5s +20%
mld_sample_s1_s2 6s 6s +0%
ntt_native_aarch64 6s - new
polyveck_use_hint 6s 6s +0%
polyvecl_ntt 6s 4s +50%
polyvecl_permute_bitrev_to_custom 6s 2s +200%
unpack_hints 6s 5s +20%
intt_native_x86_64 5s 3s +67%
keccakf1600x4_xor_bytes 5s 3s +67%
mld_check_pct 5s 6s -17%
pack_sig_c_h 5s 3s +67%
poly_caddq_c 5s 7s -29%
poly_challenge 5s 8s -38%
poly_ntt_native 5s 5s +0%
poly_uniform_gamma1_4x 5s 4s +25%
polyveck_add 5s 6s -17%
polyveck_chknorm 5s 9s -44%
polyveck_decompose 5s 6s -17%
polyveck_make_hint 5s 1s +400%
polyveck_reduce 5s 3s +67%
polyveck_sub 5s 5s +0%
polyvecl_unpack_eta 5s 3s +67%
shake256_finalize 5s 2s +150%
sign_keypair 5s 5s +0%
sign_signature_pre_hash_internal 5s 5s +0%
sign_verify_extmu 5s 4s +25%
fqscale 4s 4s +0%
mld_prepare_domain_separation_prefix 4s 5s -20%
mld_sample_s1_s2_serial 4s 2s +100%
montgomery_reduce 4s 2s +100%
pack_pk 4s 2s +100%
poly_caddq 4s 3s +33%
poly_ntt_c 4s 3s +33%
poly_power2round 4s 4s +0%
poly_sub 4s 2s +100%
poly_uniform_eta 4s 5s -20%
polyt1_pack 4s 4s +0%
polyveck_caddq 4s 4s +0%
polyveck_ntt 4s 4s +0%
polyveck_pack_eta 4s 4s +0%
polyveck_shiftl 4s 4s +0%
polyveck_unpack_t0 4s 2s +100%
polyvecl_uniform_gamma1_serial 4s 4s +0%
rej_eta_c 4s 4s +0%
shake128_init 4s 1s +300%
shake256 4s 3s +33%
sign_open 4s 4s +0%
sign_signature 4s 3s +33%
sign_signature_extmu 4s 4s +0%
sign_signature_pre_hash_shake256 4s 6s -33%
sign_verify 4s 3s +33%
unpack_sk 4s 2s +100%
use_hint 4s 3s +33%
decompose 3s 2s +50%
keccakf1600_xor_bytes (big endian) 3s 2s +50%
make_hint 3s 2s +50%
mld_ct_sel_int32 3s 2s +50%
mld_value_barrier_i64 3s 3s +0%
ntt_native_x86_64 3s 3s +0%
pack_sk 3s 2s +50%
poly_chknorm_native 3s 2s +50%
poly_invntt_tomont 3s 3s +0%
poly_invntt_tomont_native 3s 2s +50%
poly_ntt 3s 5s -40%
poly_pointwise_montgomery 3s 2s +50%
poly_pointwise_montgomery_native 3s 4s -25%
poly_shiftl 3s 3s +0%
poly_uniform 3s 4s -25%
poly_uniform_gamma1 3s 5s -40%
poly_use_hint_c 3s 5s -40%
poly_use_hint_native 3s 3s +0%
polyeta_pack 3s 3s +0%
polyt0_pack 3s 4s -25%
polyveck_invntt_tomont 3s 5s -40%
polyveck_pack_t0 3s 2s +50%
polyveck_pack_w1 3s 3s +0%
polyveck_pointwise_poly_montgomery 3s 2s +50%
polyveck_unpack_eta 3s 3s +0%
polyvecl_chknorm 3s 4s -25%
polyvecl_pack_eta 3s 2s +50%
polyvecl_uniform_gamma1 3s 3s +0%
polyvecl_unpack_z 3s 4s -25%
polyw1_pack 3s 4s -25%
polyz_pack 3s 2s +50%
polyz_unpack_native 3s 4s -25%
power2round 3s 5s -40%
reduce32 3s 5s -40%
rej_eta_native 3s 6s -50%
shake128_release 3s 4s -25%
shake128_squeeze 3s 2s +50%
shake128x4_absorb_once 3s 1s +200%
shake128x4_squeezeblocks 3s 4s -25%
shake256_init 3s 3s +0%
shake256_release 3s 3s +0%
shake256x4_absorb_once 3s 2s +50%
sign_verify_pre_hash_shake256 3s 4s -25%
unpack_pk 3s 3s +0%
unpack_sig 3s 3s +0%
caddq 2s 2s +0%
keccak_finalize 2s 1s +100%
keccak_init 2s 3s -33%
keccak_squeeze 2s 2s +0%
keccakf1600_extract_bytes (big endian) 2s 2s +0%
keccakf1600_xor_bytes 2s 3s -33%
keccakf1600x4_extract_bytes 2s 3s -33%
keccakf1600x4_permute 2s 3s -33%
mld_ct_abs_i32 2s 3s -33%
mld_ct_cmask_neg_i32 2s 2s +0%
mld_ct_cmask_nonzero_u32 2s 2s +0%
mld_ct_cmask_nonzero_u8 2s 3s -33%
mld_ct_get_optblocker_u32 2s 2s +0%
mld_ct_get_optblocker_u8 2s 2s +0%
mld_value_barrier_u32 2s 3s -33%
mld_value_barrier_u8 2s 2s +0%
pack_sig_z 2s 5s -60%
poly_caddq_native 2s 4s -50%
poly_caddq_native_aarch64 2s 4s -50%
poly_chknorm 2s 5s -60%
poly_chknorm_native_aarch64 2s 5s -60%
poly_decompose 2s 3s -33%
poly_decompose_c 2s 2s +0%
poly_decompose_native 2s 3s -33%
poly_make_hint 2s 2s +0%
poly_reduce 2s 2s +0%
poly_use_hint 2s 4s -50%
polyvecl_pointwise_acc_montgomery 2s 4s -50%
polyvecl_pointwise_acc_montgomery_native 2s 4s -50%
polyz_unpack 2s 3s -33%
rej_eta 2s 4s -50%
shake128_absorb 2s 2s +0%
shake128_finalize 2s 3s -33%
shake256_absorb 2s 3s -33%
shake256_squeeze 2s 4s -50%
shake256x4_squeezeblocks 2s 1s +100%
sign_verify_pre_hash_internal 2s 3s -33%
sys_check_capability 2s 2s +0%
mld_ct_get_optblocker_i64 1s 3s -67%
mld_keccakf1600_extract_bytes 1s 4s -75%
polyt1_unpack 1s 3s -67%

@oqs-bot
Copy link
Contributor

oqs-bot commented Mar 10, 2026

CBMC Results (ML-DSA-65)

Full Results (177 proofs)
Proof Status Current Previous Change
**TOTAL** 2727s 2340s +16.5%
sign_verify_internal 367s 319s +15%
mld_attempt_signature_generation 299s 255s +17%
polyvecl_pointwise_acc_montgomery_c 233s 170s +37%
poly_pointwise_montgomery_c 190s 144s +32%
rej_uniform_native 159s 136s +17%
polyvec_matrix_expand 137s 121s +13%
mld_invntt_layer 107s 91s +18%
mld_ct_memcmp 84s 69s +22%
polyvec_matrix_expand_serial 71s 66s +8%
mld_ntt_layer 63s 52s +21%
keccak_squeezeblocks_x4 44s 41s +7%
sign_signature_internal 41s 34s +21%
polymat_permute_bitrev_to_custom 32s 31s +3%
mld_compute_t0_t1_tr_from_sk_components 26s 27s -4%
fqmul 23s 19s +21%
poly_chknorm_c 23s 18s +28%
rej_uniform 23s 19s +21%
poly_uniform_eta_4x 18s 17s +6%
poly_uniform_4x 16s 15s +7%
polyt0_unpack 16s 15s +7%
rej_uniform_c 16s 13s +23%
keccakf1600x4_permute_native 15s 14s +7%
mld_ntt_butterfly_block 14s 11s +27%
polyveck_decompose 13s 12s +8%
polyvec_matrix_pointwise_montgomery 12s 12s +0%
polyveck_sub 12s 10s +20%
polyvecl_chknorm 12s 12s +0%
poly_add 11s 10s +10%
polyveck_ntt 11s 13s -15%
keccak_absorb_once_x4 10s 11s -9%
polyveck_invntt_tomont 10s 8s +25%
polyveck_power2round 10s 11s -9%
keccak_absorb 9s 7s +29%
keccakf1600_permute_native 9s 7s +29%
polyveck_shiftl 9s 7s +29%
keccakf1600_permute 8s 8s +0%
mld_check_pct 8s 7s +14%
mld_compute_pack_z 8s 8s +0%
poly_challenge 8s 3s +167%
poly_decompose_c 8s 7s +14%
poly_invntt_tomont_c 8s 6s +33%
polyveck_add 8s 10s -20%
polyveck_reduce 8s 7s +14%
polyveck_use_hint 8s 8s +0%
polyvecl_ntt 8s 7s +14%
sign 8s 6s +33%
sign_pk_from_sk 8s 8s +0%
sign_verify_pre_hash_internal 8s 3s +167%
mld_polyvecl_permute_bitrev_to_custom_native 7s 5s +40%
poly_caddq_c 7s 4s +75%
poly_power2round 7s 5s +40%
polyveck_caddq 7s 9s -22%
polyveck_make_hint 7s 7s +0%
polyveck_pointwise_poly_montgomery 7s 9s -22%
sign_open 7s 3s +133%
mld_ct_abs_i32 6s 1s +500%
mld_prepare_domain_separation_prefix 6s 4s +50%
mld_sample_s1_s2 6s 5s +20%
mld_sample_s1_s2_serial 6s 4s +50%
poly_caddq_native_aarch64 6s 3s +100%
poly_use_hint_c 6s 4s +50%
polyeta_pack 6s 2s +200%
polyeta_unpack 6s 7s -14%
polyt0_pack 6s 4s +50%
polyvecl_permute_bitrev_to_custom 6s 3s +100%
rej_eta_native 6s 4s +50%
sign_keypair_internal 6s 8s -25%
sign_signature_pre_hash_shake256 6s 3s +100%
keccakf1600_xor_bytes (big endian) 5s 3s +67%
poly_chknorm_native_aarch64 5s 4s +25%
poly_ntt 5s 3s +67%
poly_ntt_native 5s 3s +67%
poly_uniform_eta 5s 5s +0%
polyveck_pack_t0 5s 3s +67%
polyvecl_pointwise_acc_montgomery_native 5s 7s -29%
polyvecl_uniform_gamma1 5s 2s +150%
shake128_squeeze 5s 2s +150%
sign_signature 5s 7s -29%
sign_signature_pre_hash_internal 5s 5s +0%
sign_verify 5s 3s +67%
sign_verify_pre_hash_shake256 5s 3s +67%
unpack_hints 5s 5s +0%
keccakf1600_extract_bytes (big endian) 4s 2s +100%
keccakf1600x4_extract_bytes 4s 1s +300%
keccakf1600x4_xor_bytes 4s 4s +0%
mld_h 4s 4s +0%
mld_value_barrier_u8 4s 3s +33%
montgomery_reduce 4s 2s +100%
pack_pk 4s 2s +100%
poly_caddq 4s 3s +33%
poly_chknorm 4s 3s +33%
poly_invntt_tomont 4s 3s +33%
poly_ntt_c 4s 2s +100%
poly_shiftl 4s 2s +100%
poly_sub 4s 2s +100%
poly_uniform_gamma1 4s 5s -20%
poly_uniform_gamma1_4x 4s 4s +0%
poly_use_hint_native 4s 4s +0%
polyveck_pack_eta 4s 4s +0%
polyvecl_unpack_eta 4s 3s +33%
polyz_unpack 4s 4s +0%
polyz_unpack_native 4s 4s +0%
power2round 4s 2s +100%
rej_eta_c 4s 3s +33%
shake128_absorb 4s 3s +33%
shake128_release 4s 1s +300%
shake128x4_squeezeblocks 4s 1s +300%
sign_keypair 4s 5s -20%
sign_signature_extmu 4s 4s +0%
sign_verify_extmu 4s 6s -33%
unpack_sig 4s 2s +100%
unpack_sk 4s 6s -33%
use_hint 4s 4s +0%
decompose 3s 5s -40%
keccakf1600_xor_bytes 3s 2s +50%
keccakf1600x4_permute 3s 2s +50%
make_hint 3s 3s +0%
mld_ct_cmask_nonzero_u32 3s 2s +50%
mld_ct_cmask_nonzero_u8 3s 2s +50%
mld_ct_get_optblocker_u8 3s 2s +50%
mld_keccakf1600_extract_bytes 3s 1s +200%
mld_value_barrier_i64 3s 2s +50%
mld_value_barrier_u32 3s 1s +200%
ntt_native_x86_64 3s 5s -40%
pack_sig_c_h 3s 2s +50%
pack_sig_z 3s 4s -25%
poly_caddq_native 3s 3s +0%
poly_make_hint 3s 4s -25%
poly_pointwise_montgomery_native 3s 5s -40%
poly_reduce 3s 2s +50%
poly_uniform 3s 6s -50%
poly_use_hint 3s 4s -25%
polyt1_pack 3s 2s +50%
polyt1_unpack 3s 6s -50%
polyveck_chknorm 3s 6s -50%
polyveck_pack_w1 3s 4s -25%
polyvecl_pack_eta 3s 2s +50%
polyvecl_pointwise_acc_montgomery 3s 4s -25%
polyvecl_uniform_gamma1_serial 3s 3s +0%
polyz_pack 3s 3s +0%
rej_eta 3s 3s +0%
shake128_finalize 3s 1s +200%
shake128x4_absorb_once 3s 2s +50%
shake256_finalize 3s 2s +50%
shake256_squeeze 3s 4s -25%
sys_check_capability 3s 3s +0%
caddq 2s 3s -33%
fqscale 2s 3s -33%
intt_native_x86_64 2s 3s -33%
keccak_init 2s 3s -33%
keccak_squeeze 2s 4s -50%
mld_ct_cmask_neg_i32 2s 4s -50%
mld_ct_get_optblocker_i64 2s 3s -33%
mld_ct_get_optblocker_u32 2s 1s +100%
ntt_native_aarch64 2s - new
pack_sk 2s 3s -33%
poly_chknorm_native 2s 4s -50%
poly_decompose 2s 2s +0%
poly_decompose_native 2s 2s +0%
poly_invntt_tomont_native 2s 3s -33%
poly_pointwise_montgomery 2s 5s -60%
polyveck_unpack_eta 2s 3s -33%
polyveck_unpack_t0 2s 2s +0%
polyvecl_unpack_z 2s 3s -33%
polyw1_pack 2s 3s -33%
polyz_unpack_c 2s 5s -60%
reduce32 2s 3s -33%
shake256 2s 3s -33%
shake256_absorb 2s 4s -50%
shake256_init 2s 2s +0%
shake256x4_absorb_once 2s 2s +0%
shake256x4_squeezeblocks 2s 2s +0%
unpack_pk 2s 3s -33%
keccak_finalize 1s 2s -50%
mld_ct_sel_int32 1s 1s +0%
shake128_init 1s 3s -67%
shake256_release 1s 2s -50%

@oqs-bot
Copy link
Contributor

oqs-bot commented Mar 10, 2026

CBMC Results (ML-DSA-87)

Full Results (177 proofs)
Proof Status Current Previous Change
**TOTAL** 2650s 2673s -0.9%
sign_verify_internal 332s 332s +0%
polyvecl_pointwise_acc_montgomery_c 259s 279s -7%
mld_attempt_signature_generation 242s 237s +2%
polyvec_matrix_expand 173s 175s -1%
poly_pointwise_montgomery_c 144s 152s -5%
rej_uniform_native 143s 144s -1%
mld_invntt_layer 97s 95s +2%
polyvec_matrix_expand_serial 84s 80s +5%
mld_ct_memcmp 72s 75s -4%
polyveck_decompose 58s 57s +2%
mld_ntt_layer 55s 54s +2%
sign_signature_internal 53s 54s -2%
polymat_permute_bitrev_to_custom 47s 47s +0%
keccak_squeezeblocks_x4 42s 44s -5%
mld_compute_t0_t1_tr_from_sk_components 24s 24s +0%
rej_uniform 22s 20s +10%
fqmul 20s 19s +5%
poly_chknorm_c 19s 20s -5%
poly_uniform_4x 17s 14s +21%
poly_uniform_eta_4x 16s 15s +7%
polyeta_unpack 16s 17s -6%
rej_uniform_c 15s 15s +0%
keccakf1600x4_permute_native 13s 12s +8%
mld_ntt_butterfly_block 13s 13s +0%
poly_add 13s 11s +18%
polyt0_unpack 13s 16s -19%
mld_polyvecl_permute_bitrev_to_custom_native 12s 12s +0%
polyvec_matrix_pointwise_montgomery 11s 11s +0%
polyveck_use_hint 11s 11s +0%
keccak_absorb_once_x4 10s 12s -17%
poly_decompose_c 10s 8s +25%
polyveck_add 10s 12s -17%
polyveck_power2round 10s 10s +0%
polyvecl_ntt 10s 11s -9%
polyveck_invntt_tomont 9s 8s +12%
polyveck_reduce 9s 8s +12%
sign_pk_from_sk 9s 8s +12%
keccakf1600_permute 8s 7s +14%
keccakf1600_permute_native 8s 9s -11%
poly_invntt_tomont_c 8s 6s +33%
polyveck_pointwise_poly_montgomery 8s 9s -11%
polyz_unpack_c 8s 8s +0%
sign_keypair_internal 8s 5s +60%
mld_check_pct 7s 7s +0%
polyveck_caddq 7s 9s -22%
polyveck_ntt 7s 8s -12%
polyveck_sub 7s 6s +17%
sign_signature 7s 3s +133%
unpack_hints 7s 7s +0%
keccak_absorb 6s 6s +0%
mld_compute_pack_z 6s 6s +0%
mld_h 6s 4s +50%
mld_sample_s1_s2 6s 5s +20%
poly_chknorm 6s 4s +50%
poly_invntt_tomont_native 6s 3s +100%
poly_power2round 6s 5s +20%
poly_uniform 6s 6s +0%
poly_uniform_eta 6s 4s +50%
polyveck_pack_w1 6s 4s +50%
sign_signature_extmu 6s 2s +200%
sign_verify 6s 6s +0%
unpack_pk 6s 4s +50%
mld_ct_cmask_nonzero_u8 5s 4s +25%
mld_sample_s1_s2_serial 5s 6s -17%
ntt_native_x86_64 5s 4s +25%
pack_pk 5s 2s +150%
poly_caddq_c 5s 5s +0%
poly_ntt 5s 4s +25%
poly_sub 5s 3s +67%
polyveck_chknorm 5s 5s +0%
polyveck_make_hint 5s 7s -29%
polyveck_shiftl 5s 9s -44%
polyvecl_pack_eta 5s 4s +25%
polyvecl_unpack_z 5s 3s +67%
reduce32 5s 3s +67%
rej_eta_c 5s 5s +0%
shake256_release 5s 3s +67%
sign 5s 7s -29%
sign_open 5s 4s +25%
sign_verify_pre_hash_shake256 5s 3s +67%
unpack_sig 5s 3s +67%
intt_native_x86_64 4s 3s +33%
keccakf1600x4_extract_bytes 4s 2s +100%
mld_ct_get_optblocker_u32 4s 2s +100%
ntt_native_aarch64 4s - new
pack_sig_z 4s 4s +0%
poly_caddq_native 4s 2s +100%
poly_decompose_native 4s 2s +100%
poly_reduce 4s 3s +33%
poly_use_hint 4s 2s +100%
polyeta_pack 4s 2s +100%
polyt1_pack 4s 2s +100%
polyveck_unpack_eta 4s 3s +33%
polyveck_unpack_t0 4s 3s +33%
polyvecl_chknorm 4s 5s -20%
polyvecl_pointwise_acc_montgomery_native 4s 5s -20%
polyvecl_uniform_gamma1_serial 4s 4s +0%
polyvecl_unpack_eta 4s 4s +0%
polyz_unpack_native 4s 4s +0%
rej_eta 4s 4s +0%
shake128_finalize 4s 3s +33%
shake128_init 4s 4s +0%
shake256_squeeze 4s 2s +100%
sign_keypair 4s 4s +0%
sign_verify_extmu 4s 6s -33%
sign_verify_pre_hash_internal 4s 3s +33%
unpack_sk 4s 4s +0%
decompose 3s 3s +0%
keccak_finalize 3s 1s +200%
keccak_init 3s 3s +0%
keccak_squeeze 3s 3s +0%
keccakf1600_extract_bytes (big endian) 3s 2s +50%
keccakf1600_xor_bytes 3s 3s +0%
keccakf1600_xor_bytes (big endian) 3s 3s +0%
make_hint 3s 3s +0%
mld_ct_get_optblocker_u8 3s 2s +50%
mld_ct_sel_int32 3s 2s +50%
mld_keccakf1600_extract_bytes 3s 4s -25%
mld_prepare_domain_separation_prefix 3s 5s -40%
mld_value_barrier_i64 3s 4s -25%
poly_chknorm_native 3s 2s +50%
poly_chknorm_native_aarch64 3s 2s +50%
poly_decompose 3s 3s +0%
poly_make_hint 3s 3s +0%
poly_ntt_c 3s 5s -40%
poly_ntt_native 3s 4s -25%
poly_pointwise_montgomery 3s 4s -25%
poly_pointwise_montgomery_native 3s 2s +50%
poly_shiftl 3s 5s -40%
poly_uniform_gamma1 3s 2s +50%
poly_uniform_gamma1_4x 3s 5s -40%
poly_use_hint_c 3s 5s -40%
poly_use_hint_native 3s 4s -25%
polyt0_pack 3s 5s -40%
polyveck_pack_eta 3s 4s -25%
polyvecl_permute_bitrev_to_custom 3s 2s +50%
polyvecl_pointwise_acc_montgomery 3s 4s -25%
polyw1_pack 3s 6s -50%
polyz_pack 3s 3s +0%
polyz_unpack 3s 3s +0%
shake256 3s 3s +0%
shake256_init 3s 2s +50%
shake256x4_absorb_once 3s 3s +0%
shake256x4_squeezeblocks 3s 3s +0%
sign_signature_pre_hash_internal 3s 3s +0%
use_hint 3s 5s -40%
caddq 2s 4s -50%
fqscale 2s 3s -33%
keccakf1600x4_xor_bytes 2s 3s -33%
mld_ct_abs_i32 2s 1s +100%
mld_ct_cmask_neg_i32 2s 3s -33%
mld_value_barrier_u8 2s 2s +0%
pack_sig_c_h 2s 3s -33%
poly_caddq 2s 4s -50%
poly_caddq_native_aarch64 2s 4s -50%
poly_challenge 2s 3s -33%
poly_invntt_tomont 2s 5s -60%
polyveck_pack_t0 2s 3s -33%
polyvecl_uniform_gamma1 2s 4s -50%
power2round 2s 3s -33%
rej_eta_native 2s 4s -50%
shake128_absorb 2s 4s -50%
shake128_release 2s 3s -33%
shake128_squeeze 2s 1s +100%
shake256_absorb 2s 2s +0%
shake256_finalize 2s 4s -50%
sign_signature_pre_hash_shake256 2s 7s -71%
sys_check_capability 2s 2s +0%
keccakf1600x4_permute 1s 3s -67%
mld_ct_cmask_nonzero_u32 1s 5s -80%
mld_ct_get_optblocker_i64 1s 2s -50%
mld_value_barrier_u32 1s 5s -80%
montgomery_reduce 1s 2s -50%
pack_sk 1s 3s -67%
polyt1_unpack 1s 4s -75%
shake128x4_absorb_once 1s 2s -50%
shake128x4_squeezeblocks 1s 4s -75%

Copy link
Contributor

@mkannwischer mkannwischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dkostic! I took the liberty to add the corresponding CBMC contract and proof.

The specs should still be moved into common/ to be shared with the x86 proofs. Rest looks very good to me.

@mkannwischer mkannwischer force-pushed the aarch64-ntt-hol branch 2 times, most recently from 6285a26 to 581acd0 Compare March 15, 2026 07:59
Copy link
Contributor

@mkannwischer mkannwischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dkostic for the changes! I took the liberty to revert some of the changes you made to the hol_light workflow as they are not needed.
Rest looks good to me now.

@hanno-becker, could you take another look, please?

Copy link
Contributor

@hanno-becker hanno-becker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks excellent, thank you @dkostic!

The only minor point is the avoidance of hardcoded lengths constants in the scripts. Could you follow the pattern in mlkem-native to avoid hardcoded lengths and thereby make the proof scripts agnostic to re-running SLOTHY?

Copy link
Contributor

@hanno-becker hanno-becker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks a lot @dkostic!

dkostic and others added 3 commits March 20, 2026 14:28
Port the ML-DSA Forward NTT implementation and its HOL Light proof of
correctness from s2n-bignum to mldsa-native. The proof verifies the
AArch64 NEON NTT implementation at the object-code level, showing that
the output coefficients are congruent to the forward NTT of the input
modulo 8380417, with bounded output coefficients. A constant-time and
memory safety proof is also included.

New files:
- aarch64/mldsa/mldsa_ntt.S: Assembly derived from dev/aarch64_opt/src/ntt.S
- aarch64/proofs/mldsa_ntt.ml: HOL Light proof (MLDSA_NTT_CORRECT,
MLDSA_NTT_SUBROUTINE_CORRECT, and MLDSA_NTT_SUBROUTINE_SAFE theorems)
- aarch64/proofs/mldsa_specs.ml: Self-contained ML-DSA specifications and
congruence/bounds propagation infrastructure, ported from s2n-bignum's
common/mlkem_mldsa.ml with only ARM ML-DSA relevant definitions
- aarch64/proofs/subroutine_signatures.ml: ML-DSA subroutine signatures
for the safety proof infrastructure

Modified files:
- aarch64/proofs/aarch64_utils.ml: Add MEMORY_128_FROM_32_TAC
- aarch64/Makefile: Add mldsa_ntt to build targets

Signed-off-by: dkostic <dkostic@amazon.com>
Add CBMC contract for mld_ntt_asm matching the bounds from the
AArch64 NTT HOL Light proof (input: abs <= 8380416, output: abs <= 75423752).
Add corresponding CBMC proof for mld_ntt_native using the contract.

Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
Consolidate the shared ML-DSA specifications and congruence/bounds
propagation infrastructure into a single file at
proofs/hol_light/common/mldsa_specs.ml, used by both AArch64 and
x86_64 proofs.

The merged file contains:
- Shared: bitreverse8, reorder, BITREVERSE8_CLAUSES, congruence/bounds
  infrastructure (CONGBOUND_WORD_*, ASM_CONGBOUND_RULE, etc.), SIMD
  simplification tactics
- x86_64-specific: AVX2 NTT ordering, mldsa_montred/barred/montmul
- AArch64-specific: arm_mldsa_forward_ntt, arm_mldsa_barmul

ASM_CONGBOUND_RULE now handles both arm_mldsa_barmul and
mldsa_montred/barred/montmul cases.

Signed-off-by: dkostic <dkostic@amazon.com>
@mkannwischer
Copy link
Contributor

After the s2n-bignum update the safety proofs seem to fail. Maybe related to awslabs/s2n-bignum#355 ?

@dkostic, could you maybe take a look?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

HOL-Light: Prove AArch64 NTT

4 participants