Conversation
00b155f to
3819863
Compare
CBMC Results (ML-DSA-87)Full Results (177 proofs)
|
CBMC Results (ML-DSA-44)Full Results (177 proofs)
|
CBMC Results (ML-DSA-65)Full Results (177 proofs)
|
There was a problem hiding this comment.
Mac Mini (M1, 2020) benchmarks (opt)
Details
| Benchmark suite | Current: 14097e6 | Previous: daf271b | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
45681 cycles |
45685 cycles |
1.00 |
ML-DSA-44 sign |
131153 cycles |
131164 cycles |
1.00 |
ML-DSA-44 verify |
47527 cycles |
47530 cycles |
1.00 |
ML-DSA-65 keypair |
80457 cycles |
80479 cycles |
1.00 |
ML-DSA-65 sign |
215715 cycles |
215740 cycles |
1.00 |
ML-DSA-65 verify |
79737 cycles |
79735 cycles |
1.00 |
ML-DSA-87 keypair |
131177 cycles |
131175 cycles |
1.00 |
ML-DSA-87 sign |
277048 cycles |
277004 cycles |
1.00 |
ML-DSA-87 verify |
130004 cycles |
129971 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Mac Mini (M1, 2020) benchmarks (no-opt)
Details
| Benchmark suite | Current: 14097e6 | Previous: daf271b | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
111983 cycles |
111979 cycles |
1.00 |
ML-DSA-44 sign |
403592 cycles |
403622 cycles |
1.00 |
ML-DSA-44 verify |
119886 cycles |
119876 cycles |
1.00 |
ML-DSA-65 keypair |
192137 cycles |
192166 cycles |
1.00 |
ML-DSA-65 sign |
657120 cycles |
657078 cycles |
1.00 |
ML-DSA-65 verify |
193900 cycles |
193891 cycles |
1.00 |
ML-DSA-87 keypair |
317930 cycles |
318010 cycles |
1.00 |
ML-DSA-87 sign |
836905 cycles |
836903 cycles |
1.00 |
ML-DSA-87 verify |
322922 cycles |
322994 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Intel Xeon 4th gen (c7i)
Details
| Benchmark suite | Current: e663c5f | Previous: db65535 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
34477 cycles |
34381 cycles |
1.00 |
ML-DSA-44 sign |
120394 cycles |
120118 cycles |
1.00 |
ML-DSA-44 verify |
38039 cycles |
38106 cycles |
1.00 |
ML-DSA-65 keypair |
60486 cycles |
61325 cycles |
0.99 |
ML-DSA-65 sign |
201395 cycles |
201746 cycles |
1.00 |
ML-DSA-65 verify |
62527 cycles |
62841 cycles |
1.00 |
ML-DSA-87 keypair |
94845 cycles |
92915 cycles |
1.02 |
ML-DSA-87 sign |
239636 cycles |
231813 cycles |
1.03 |
ML-DSA-87 verify |
95988 cycles |
94836 cycles |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)
Details
| Benchmark suite | Current: 14097e6 | Previous: daf271b | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
229063 cycles |
232745 cycles |
0.98 |
ML-DSA-44 sign |
628858 cycles |
629812 cycles |
1.00 |
ML-DSA-44 verify |
229339 cycles |
229277 cycles |
1.00 |
ML-DSA-65 keypair |
378941 cycles |
422090 cycles |
0.90 |
ML-DSA-65 sign |
1007370 cycles |
1067756 cycles |
0.94 |
ML-DSA-65 verify |
376246 cycles |
393848 cycles |
0.96 |
ML-DSA-87 keypair |
690237 cycles |
673725 cycles |
1.02 |
ML-DSA-87 sign |
1396068 cycles |
1405386 cycles |
0.99 |
ML-DSA-87 verify |
663094 cycles |
657567 cycles |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Intel Xeon 4th gen (c7i) (no-opt)
Details
| Benchmark suite | Current: e663c5f | Previous: db65535 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
93636 cycles |
94097 cycles |
1.00 |
ML-DSA-44 sign |
332371 cycles |
333264 cycles |
1.00 |
ML-DSA-44 verify |
99653 cycles |
99803 cycles |
1.00 |
ML-DSA-65 keypair |
159756 cycles |
160115 cycles |
1.00 |
ML-DSA-65 sign |
544298 cycles |
544184 cycles |
1.00 |
ML-DSA-65 verify |
160693 cycles |
160692 cycles |
1.00 |
ML-DSA-87 keypair |
266718 cycles |
267433 cycles |
1.00 |
ML-DSA-87 sign |
705724 cycles |
707379 cycles |
1.00 |
ML-DSA-87 verify |
269841 cycles |
270279 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
AMD EPYC 3rd gen (c6a)
Details
| Benchmark suite | Current: e663c5f | Previous: db65535 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
69099 cycles |
68974 cycles |
1.00 |
ML-DSA-44 sign |
187200 cycles |
187318 cycles |
1.00 |
ML-DSA-44 verify |
68987 cycles |
69050 cycles |
1.00 |
ML-DSA-65 keypair |
119190 cycles |
119428 cycles |
1.00 |
ML-DSA-65 sign |
299797 cycles |
300617 cycles |
1.00 |
ML-DSA-65 verify |
115518 cycles |
115643 cycles |
1.00 |
ML-DSA-87 keypair |
203389 cycles |
203571 cycles |
1.00 |
ML-DSA-87 sign |
394191 cycles |
394649 cycles |
1.00 |
ML-DSA-87 verify |
195428 cycles |
195659 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Intel Xeon 3rd gen (c6i)
Details
| Benchmark suite | Current: e663c5f | Previous: db65535 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
56776 cycles |
56817 cycles |
1.00 |
ML-DSA-44 sign |
180517 cycles |
182410 cycles |
0.99 |
ML-DSA-44 verify |
60909 cycles |
61615 cycles |
0.99 |
ML-DSA-65 keypair |
98542 cycles |
98729 cycles |
1.00 |
ML-DSA-65 sign |
298159 cycles |
298290 cycles |
1.00 |
ML-DSA-65 verify |
100252 cycles |
100286 cycles |
1.00 |
ML-DSA-87 keypair |
153194 cycles |
152586 cycles |
1.00 |
ML-DSA-87 sign |
355896 cycles |
355720 cycles |
1.00 |
ML-DSA-87 verify |
153887 cycles |
153499 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
AMD EPYC 4th gen (c7a)
Details
| Benchmark suite | Current: e663c5f | Previous: db65535 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
41111 cycles |
42279 cycles |
0.97 |
ML-DSA-44 sign |
132463 cycles |
132300 cycles |
1.00 |
ML-DSA-44 verify |
43376 cycles |
43971 cycles |
0.99 |
ML-DSA-65 keypair |
71836 cycles |
76769 cycles |
0.94 |
ML-DSA-65 sign |
214168 cycles |
217452 cycles |
0.98 |
ML-DSA-65 verify |
72274 cycles |
73895 cycles |
0.98 |
ML-DSA-87 keypair |
109272 cycles |
108025 cycles |
1.01 |
ML-DSA-87 sign |
250093 cycles |
252354 cycles |
0.99 |
ML-DSA-87 verify |
110204 cycles |
109188 cycles |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'AMD EPYC 4th gen (c7a)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.
| Benchmark suite | Current: 9146fd9 | Previous: db65535 | Ratio |
|---|---|---|---|
ML-DSA-44 verify |
46019 cycles |
43971 cycles |
1.05 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
AMD EPYC 3rd gen (c6a) (no-opt)
Details
| Benchmark suite | Current: e663c5f | Previous: db65535 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
135196 cycles |
134983 cycles |
1.00 |
ML-DSA-44 sign |
524574 cycles |
524482 cycles |
1.00 |
ML-DSA-44 verify |
147495 cycles |
147385 cycles |
1.00 |
ML-DSA-65 keypair |
228079 cycles |
228309 cycles |
1.00 |
ML-DSA-65 sign |
865741 cycles |
864340 cycles |
1.00 |
ML-DSA-65 verify |
236319 cycles |
236413 cycles |
1.00 |
ML-DSA-87 keypair |
370971 cycles |
370688 cycles |
1.00 |
ML-DSA-87 sign |
1080314 cycles |
1079564 cycles |
1.00 |
ML-DSA-87 verify |
382962 cycles |
383220 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Intel Xeon 3rd gen (c6i) (no-opt)
Details
| Benchmark suite | Current: e663c5f | Previous: db65535 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
157458 cycles |
157614 cycles |
1.00 |
ML-DSA-44 sign |
549359 cycles |
551534 cycles |
1.00 |
ML-DSA-44 verify |
169292 cycles |
169123 cycles |
1.00 |
ML-DSA-65 keypair |
268056 cycles |
267907 cycles |
1.00 |
ML-DSA-65 sign |
904109 cycles |
904333 cycles |
1.00 |
ML-DSA-65 verify |
275154 cycles |
275011 cycles |
1.00 |
ML-DSA-87 keypair |
448822 cycles |
448619 cycles |
1.00 |
ML-DSA-87 sign |
1157710 cycles |
1157905 cycles |
1.00 |
ML-DSA-87 verify |
458420 cycles |
458683 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Graviton4
Details
| Benchmark suite | Current: e663c5f | Previous: db65535 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
68120 cycles |
68090 cycles |
1.00 |
ML-DSA-44 sign |
202529 cycles |
202380 cycles |
1.00 |
ML-DSA-44 verify |
70991 cycles |
70623 cycles |
1.01 |
ML-DSA-65 keypair |
121071 cycles |
121010 cycles |
1.00 |
ML-DSA-65 sign |
331858 cycles |
332267 cycles |
1.00 |
ML-DSA-65 verify |
118015 cycles |
117974 cycles |
1.00 |
ML-DSA-87 keypair |
198147 cycles |
198259 cycles |
1.00 |
ML-DSA-87 sign |
427693 cycles |
428218 cycles |
1.00 |
ML-DSA-87 verify |
194725 cycles |
194635 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Graviton3
Details
| Benchmark suite | Current: e663c5f | Previous: db65535 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
72367 cycles |
72253 cycles |
1.00 |
ML-DSA-44 sign |
212483 cycles |
212376 cycles |
1.00 |
ML-DSA-44 verify |
75753 cycles |
75747 cycles |
1.00 |
ML-DSA-65 keypair |
127620 cycles |
127630 cycles |
1.00 |
ML-DSA-65 sign |
351072 cycles |
350882 cycles |
1.00 |
ML-DSA-65 verify |
125639 cycles |
125712 cycles |
1.00 |
ML-DSA-87 keypair |
205890 cycles |
208495 cycles |
0.99 |
ML-DSA-87 sign |
444711 cycles |
450030 cycles |
0.99 |
ML-DSA-87 verify |
205611 cycles |
205745 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
AMD EPYC 4th gen (c7a) (no-opt)
Details
| Benchmark suite | Current: e663c5f | Previous: db65535 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
120296 cycles |
120340 cycles |
1.00 |
ML-DSA-44 sign |
447350 cycles |
447581 cycles |
1.00 |
ML-DSA-44 verify |
130039 cycles |
130373 cycles |
1.00 |
ML-DSA-65 keypair |
205264 cycles |
204354 cycles |
1.00 |
ML-DSA-65 sign |
728856 cycles |
728319 cycles |
1.00 |
ML-DSA-65 verify |
211012 cycles |
209199 cycles |
1.01 |
ML-DSA-87 keypair |
338105 cycles |
338993 cycles |
1.00 |
ML-DSA-87 sign |
924981 cycles |
921541 cycles |
1.00 |
ML-DSA-87 verify |
347678 cycles |
348601 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Graviton4 (no-opt)
Details
| Benchmark suite | Current: e663c5f | Previous: db65535 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
128366 cycles |
128240 cycles |
1.00 |
ML-DSA-44 sign |
447669 cycles |
447597 cycles |
1.00 |
ML-DSA-44 verify |
138229 cycles |
144662 cycles |
0.96 |
ML-DSA-65 keypair |
220626 cycles |
220500 cycles |
1.00 |
ML-DSA-65 sign |
727046 cycles |
727093 cycles |
1.00 |
ML-DSA-65 verify |
222599 cycles |
223077 cycles |
1.00 |
ML-DSA-87 keypair |
364591 cycles |
365045 cycles |
1.00 |
ML-DSA-87 sign |
925963 cycles |
925847 cycles |
1.00 |
ML-DSA-87 verify |
372876 cycles |
372789 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Graviton3 (no-opt)
Details
| Benchmark suite | Current: e663c5f | Previous: db65535 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
138665 cycles |
138463 cycles |
1.00 |
ML-DSA-44 sign |
483863 cycles |
483929 cycles |
1.00 |
ML-DSA-44 verify |
148471 cycles |
162291 cycles |
0.91 |
ML-DSA-65 keypair |
241346 cycles |
241435 cycles |
1.00 |
ML-DSA-65 sign |
792690 cycles |
792312 cycles |
1.00 |
ML-DSA-65 verify |
240750 cycles |
241250 cycles |
1.00 |
ML-DSA-87 keypair |
395603 cycles |
396566 cycles |
1.00 |
ML-DSA-87 sign |
1013151 cycles |
1012538 cycles |
1.00 |
ML-DSA-87 verify |
402960 cycles |
402623 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Graviton2
Details
| Benchmark suite | Current: e663c5f | Previous: db65535 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
113469 cycles |
113410 cycles |
1.00 |
ML-DSA-44 sign |
355767 cycles |
355818 cycles |
1.00 |
ML-DSA-44 verify |
118208 cycles |
118279 cycles |
1.00 |
ML-DSA-65 keypair |
197272 cycles |
196486 cycles |
1.00 |
ML-DSA-65 sign |
590719 cycles |
588672 cycles |
1.00 |
ML-DSA-65 verify |
195355 cycles |
194830 cycles |
1.00 |
ML-DSA-87 keypair |
323052 cycles |
323043 cycles |
1.00 |
ML-DSA-87 sign |
754236 cycles |
753644 cycles |
1.00 |
ML-DSA-87 verify |
320544 cycles |
320341 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
SpacemiT K1 8 (Banana Pi F3) benchmarks (no-opt)
Details
| Benchmark suite | Current: 14097e6 | Previous: daf271b | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
827476 cycles |
828088 cycles |
1.00 |
ML-DSA-44 sign |
3238353 cycles |
3233170 cycles |
1.00 |
ML-DSA-44 verify |
921919 cycles |
920794 cycles |
1.00 |
ML-DSA-65 keypair |
1413613 cycles |
1413452 cycles |
1.00 |
ML-DSA-65 sign |
5340696 cycles |
5347688 cycles |
1.00 |
ML-DSA-65 verify |
1477470 cycles |
1477937 cycles |
1.00 |
ML-DSA-87 keypair |
2311391 cycles |
2312894 cycles |
1.00 |
ML-DSA-87 sign |
6659117 cycles |
6665352 cycles |
1.00 |
ML-DSA-87 verify |
2409640 cycles |
2411069 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Graviton2 (no-opt)
Details
| Benchmark suite | Current: e663c5f | Previous: db65535 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
213353 cycles |
213406 cycles |
1.00 |
ML-DSA-44 sign |
760604 cycles |
762744 cycles |
1.00 |
ML-DSA-44 verify |
241487 cycles |
235007 cycles |
1.03 |
ML-DSA-65 keypair |
380880 cycles |
380391 cycles |
1.00 |
ML-DSA-65 sign |
1252441 cycles |
1253555 cycles |
1.00 |
ML-DSA-65 verify |
372539 cycles |
371798 cycles |
1.00 |
ML-DSA-87 keypair |
606311 cycles |
604988 cycles |
1.00 |
ML-DSA-87 sign |
1593094 cycles |
1596422 cycles |
1.00 |
ML-DSA-87 verify |
618250 cycles |
619153 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)
Details
| Benchmark suite | Current: 14097e6 | Previous: daf271b | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
311698 cycles |
306606 cycles |
1.02 |
ML-DSA-44 sign |
1174058 cycles |
1166146 cycles |
1.01 |
ML-DSA-44 verify |
333560 cycles |
335430 cycles |
0.99 |
ML-DSA-65 keypair |
550737 cycles |
562274 cycles |
0.98 |
ML-DSA-65 sign |
1894590 cycles |
1916493 cycles |
0.99 |
ML-DSA-65 verify |
529438 cycles |
533535 cycles |
0.99 |
ML-DSA-87 keypair |
872695 cycles |
865006 cycles |
1.01 |
ML-DSA-87 sign |
2468410 cycles |
2417913 cycles |
1.02 |
ML-DSA-87 verify |
900121 cycles |
884966 cycles |
1.02 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.
| Benchmark suite | Current: f9a6d30 | Previous: 9258ea1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
309195 cycles |
299195 cycles |
1.03 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A55 (Snapdragon 888) benchmarks (opt)
Details
| Benchmark suite | Current: 14097e6 | Previous: daf271b | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
277182 cycles |
278160 cycles |
1.00 |
ML-DSA-44 sign |
816109 cycles |
822535 cycles |
0.99 |
ML-DSA-44 verify |
280990 cycles |
278070 cycles |
1.01 |
ML-DSA-65 keypair |
477648 cycles |
476503 cycles |
1.00 |
ML-DSA-65 sign |
1398700 cycles |
1347085 cycles |
1.04 |
ML-DSA-65 verify |
461181 cycles |
456015 cycles |
1.01 |
ML-DSA-87 keypair |
825204 cycles |
796551 cycles |
1.04 |
ML-DSA-87 sign |
1886968 cycles |
1773335 cycles |
1.06 |
ML-DSA-87 verify |
803609 cycles |
772360 cycles |
1.04 |
This comment was automatically generated by workflow using github-action-benchmark.
72bc3f8 to
d186f5e
Compare
7dc5f6f to
6761759
Compare
6761759 to
14097e6
Compare
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Arm Cortex-A55 (Snapdragon 888) benchmarks (opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.
| Benchmark suite | Current: 14097e6 | Previous: daf271b | Ratio |
|---|---|---|---|
ML-DSA-65 sign |
1398700 cycles |
1347085 cycles |
1.04 |
ML-DSA-87 keypair |
825204 cycles |
796551 cycles |
1.04 |
ML-DSA-87 sign |
1886968 cycles |
1773335 cycles |
1.06 |
ML-DSA-87 verify |
803609 cycles |
772360 cycles |
1.04 |
This comment was automatically generated by workflow using github-action-benchmark.
14097e6 to
16cbdfe
Compare
16cbdfe to
8c17221
Compare
|
Hello @mkannwischer, @jakemas, Thank you for helping with the review. |
This commit adds mld_poly_caddq to the benchmark components to evaluate the performance impact of replacing the caddq AVX2 intrinsics with x86_64 assembly code. Signed-off-by: willieyz <willie.zhao@chelpis.com>
This commit replace the current caddq AVX2 intrinsic implementation with x86_64 assembly to enable formal verification using HOL-Light in a follow-up PR. Signed-off-by: willieyz <willie.zhao@chelpis.com>
e663c5f to
9146fd9
Compare
oqs-bot
left a comment
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Intel Xeon 4th gen (c7i)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.
| Benchmark suite | Current: 9146fd9 | Previous: db65535 | Ratio |
|---|---|---|---|
ML-DSA-87 sign |
241094 cycles |
231813 cycles |
1.04 |
This comment was automatically generated by workflow using github-action-benchmark.
9146fd9 to
e663c5f
Compare
oqs-bot
left a comment
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Intel Xeon 4th gen (c7i)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.
| Benchmark suite | Current: e663c5f | Previous: db65535 | Ratio |
|---|---|---|---|
ML-DSA-87 sign |
239636 cycles |
231813 cycles |
1.03 |
This comment was automatically generated by workflow using github-action-benchmark.
mkannwischer
left a comment
There was a problem hiding this comment.
Thanks @willieyz - this looks good.
I have tried yesterday not load from \offset(%rdI) twice in the caddq macro and instead just have one load first, but that resulted in much worse performance.
I think the current code is fine.
@hanno-becker, could you take another look, please?
|
I too played around with some implementations, but performance I was able to get was nothing on this. This did get me familiar with the function, set up and plumbing. I have reviewed and I am happy. Once merged, I'll write the Hol-Light proof! |
poly_caddqwith assembly #491In this PR, we replace the AVX2 intrinsics implementation of
poly_caddqwith a x86_64 assembly version.To estimate the performance impact, we compare the results shown in the two tables below.
Overall, for keypair, sign, and verify (opt), the performance difference is below 1%, which is consistent with the no-opt case.
In the component-level benchmark for mld_poly_caddq, the observed performance differences are at least 17%. After unrolling the loop by a factor of 4, the differences are reduced to approximately 10%.