Conversation
CBMC Results (ML-DSA-87)Full Results (179 proofs)
|
CBMC Results (ML-DSA-44)Full Results (179 proofs)
|
CBMC Results (ML-DSA-65)Full Results (179 proofs)
|
1ea9d5f to
8a19e9a
Compare
There was a problem hiding this comment.
Mac Mini (M1, 2020) benchmarks (opt)
Details
| Benchmark suite | Current: 8a19e9a | Previous: 41da557 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
46205 cycles |
46203 cycles |
1.00 |
ML-DSA-44 sign |
131278 cycles |
131278 cycles |
1 |
ML-DSA-44 verify |
47765 cycles |
47768 cycles |
1.00 |
ML-DSA-65 keypair |
81014 cycles |
81024 cycles |
1.00 |
ML-DSA-65 sign |
215785 cycles |
215787 cycles |
1.00 |
ML-DSA-65 verify |
80057 cycles |
80052 cycles |
1.00 |
ML-DSA-87 keypair |
132158 cycles |
132151 cycles |
1.00 |
ML-DSA-87 sign |
276862 cycles |
276816 cycles |
1.00 |
ML-DSA-87 verify |
130418 cycles |
130384 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Mac Mini (M1, 2020) benchmarks (no-opt)
Details
| Benchmark suite | Current: 8a19e9a | Previous: 41da557 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
114213 cycles |
114155 cycles |
1.00 |
ML-DSA-44 sign |
418158 cycles |
417994 cycles |
1.00 |
ML-DSA-44 verify |
122319 cycles |
122262 cycles |
1.00 |
ML-DSA-65 keypair |
195508 cycles |
195499 cycles |
1.00 |
ML-DSA-65 sign |
682497 cycles |
682470 cycles |
1.00 |
ML-DSA-65 verify |
197760 cycles |
197741 cycles |
1.00 |
ML-DSA-87 keypair |
322642 cycles |
322656 cycles |
1.00 |
ML-DSA-87 sign |
864585 cycles |
864584 cycles |
1.00 |
ML-DSA-87 verify |
328628 cycles |
328653 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Intel Xeon 4th gen (c7i)
Details
| Benchmark suite | Current: 1c31329 | Previous: 0f8b8e0 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
34909 cycles |
34607 cycles |
1.01 |
ML-DSA-44 sign |
120375 cycles |
120704 cycles |
1.00 |
ML-DSA-44 verify |
38205 cycles |
38101 cycles |
1.00 |
ML-DSA-65 keypair |
60968 cycles |
61787 cycles |
0.99 |
ML-DSA-65 sign |
202493 cycles |
204750 cycles |
0.99 |
ML-DSA-65 verify |
62726 cycles |
62947 cycles |
1.00 |
ML-DSA-87 keypair |
94450 cycles |
94143 cycles |
1.00 |
ML-DSA-87 sign |
241633 cycles |
240274 cycles |
1.01 |
ML-DSA-87 verify |
96451 cycles |
95109 cycles |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Intel Xeon 4th gen (c7i) (no-opt)
Details
| Benchmark suite | Current: 1c31329 | Previous: 0f8b8e0 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
94555 cycles |
94592 cycles |
1.00 |
ML-DSA-44 sign |
333735 cycles |
333857 cycles |
1.00 |
ML-DSA-44 verify |
99826 cycles |
99864 cycles |
1.00 |
ML-DSA-65 keypair |
159716 cycles |
159928 cycles |
1.00 |
ML-DSA-65 sign |
544638 cycles |
544846 cycles |
1.00 |
ML-DSA-65 verify |
160752 cycles |
160968 cycles |
1.00 |
ML-DSA-87 keypair |
267459 cycles |
267912 cycles |
1.00 |
ML-DSA-87 sign |
709420 cycles |
709152 cycles |
1.00 |
ML-DSA-87 verify |
270024 cycles |
270923 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A55 (Snapdragon 888) benchmarks (opt)
Details
| Benchmark suite | Current: 8a19e9a | Previous: 41da557 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
276468 cycles |
277102 cycles |
1.00 |
ML-DSA-44 sign |
818650 cycles |
810656 cycles |
1.01 |
ML-DSA-44 verify |
276672 cycles |
278882 cycles |
0.99 |
ML-DSA-65 keypair |
475323 cycles |
478906 cycles |
0.99 |
ML-DSA-65 sign |
1367640 cycles |
1360800 cycles |
1.01 |
ML-DSA-65 verify |
459822 cycles |
466415 cycles |
0.99 |
ML-DSA-87 keypair |
825623 cycles |
818822 cycles |
1.01 |
ML-DSA-87 sign |
1873209 cycles |
1878770 cycles |
1.00 |
ML-DSA-87 verify |
800938 cycles |
794467 cycles |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
AMD EPYC 3rd gen (c6a)
Details
| Benchmark suite | Current: 1c31329 | Previous: 0f8b8e0 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
69188 cycles |
69341 cycles |
1.00 |
ML-DSA-44 sign |
188711 cycles |
188628 cycles |
1.00 |
ML-DSA-44 verify |
69609 cycles |
69167 cycles |
1.01 |
ML-DSA-65 keypair |
119110 cycles |
119048 cycles |
1.00 |
ML-DSA-65 sign |
301012 cycles |
300972 cycles |
1.00 |
ML-DSA-65 verify |
115433 cycles |
115129 cycles |
1.00 |
ML-DSA-87 keypair |
202783 cycles |
202705 cycles |
1.00 |
ML-DSA-87 sign |
393591 cycles |
393401 cycles |
1.00 |
ML-DSA-87 verify |
194240 cycles |
194477 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Intel Xeon 3rd gen (c6i)
Details
| Benchmark suite | Current: 1c31329 | Previous: 0f8b8e0 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
56699 cycles |
57040 cycles |
0.99 |
ML-DSA-44 sign |
182012 cycles |
183077 cycles |
0.99 |
ML-DSA-44 verify |
61080 cycles |
61515 cycles |
0.99 |
ML-DSA-65 keypair |
99136 cycles |
98855 cycles |
1.00 |
ML-DSA-65 sign |
302786 cycles |
300890 cycles |
1.01 |
ML-DSA-65 verify |
101006 cycles |
100170 cycles |
1.01 |
ML-DSA-87 keypair |
154691 cycles |
153387 cycles |
1.01 |
ML-DSA-87 sign |
357607 cycles |
356600 cycles |
1.00 |
ML-DSA-87 verify |
155516 cycles |
153458 cycles |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Graviton4
Details
| Benchmark suite | Current: 1c31329 | Previous: 0f8b8e0 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
68195 cycles |
68176 cycles |
1.00 |
ML-DSA-44 sign |
203778 cycles |
203661 cycles |
1.00 |
ML-DSA-44 verify |
70887 cycles |
70749 cycles |
1.00 |
ML-DSA-65 keypair |
120728 cycles |
120835 cycles |
1.00 |
ML-DSA-65 sign |
334605 cycles |
334759 cycles |
1.00 |
ML-DSA-65 verify |
117912 cycles |
118016 cycles |
1.00 |
ML-DSA-87 keypair |
198206 cycles |
198256 cycles |
1.00 |
ML-DSA-87 sign |
431061 cycles |
431078 cycles |
1.00 |
ML-DSA-87 verify |
194668 cycles |
194587 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
AMD EPYC 3rd gen (c6a) (no-opt)
Details
| Benchmark suite | Current: 1c31329 | Previous: 0f8b8e0 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
135143 cycles |
136158 cycles |
0.99 |
ML-DSA-44 sign |
526833 cycles |
531116 cycles |
0.99 |
ML-DSA-44 verify |
147425 cycles |
148648 cycles |
0.99 |
ML-DSA-65 keypair |
226627 cycles |
226842 cycles |
1.00 |
ML-DSA-65 sign |
859933 cycles |
861270 cycles |
1.00 |
ML-DSA-65 verify |
234683 cycles |
235270 cycles |
1.00 |
ML-DSA-87 keypair |
370822 cycles |
370874 cycles |
1.00 |
ML-DSA-87 sign |
1078880 cycles |
1077097 cycles |
1.00 |
ML-DSA-87 verify |
383211 cycles |
382857 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
AMD EPYC 4th gen (c7a)
Details
| Benchmark suite | Current: 1c31329 | Previous: 0f8b8e0 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
40627 cycles |
40587 cycles |
1.00 |
ML-DSA-44 sign |
133192 cycles |
136713 cycles |
0.97 |
ML-DSA-44 verify |
43499 cycles |
43374 cycles |
1.00 |
ML-DSA-65 keypair |
72359 cycles |
71982 cycles |
1.01 |
ML-DSA-65 sign |
214367 cycles |
214626 cycles |
1.00 |
ML-DSA-65 verify |
73011 cycles |
73104 cycles |
1.00 |
ML-DSA-87 keypair |
108917 cycles |
108890 cycles |
1.00 |
ML-DSA-87 sign |
254496 cycles |
253022 cycles |
1.01 |
ML-DSA-87 verify |
114127 cycles |
110459 cycles |
1.03 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Intel Xeon 3rd gen (c6i) (no-opt)
Details
| Benchmark suite | Current: 1c31329 | Previous: 0f8b8e0 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
157441 cycles |
157064 cycles |
1.00 |
ML-DSA-44 sign |
552409 cycles |
549451 cycles |
1.01 |
ML-DSA-44 verify |
169081 cycles |
168897 cycles |
1.00 |
ML-DSA-65 keypair |
269113 cycles |
268697 cycles |
1.00 |
ML-DSA-65 sign |
905233 cycles |
905890 cycles |
1.00 |
ML-DSA-65 verify |
274920 cycles |
274888 cycles |
1.00 |
ML-DSA-87 keypair |
448473 cycles |
448496 cycles |
1.00 |
ML-DSA-87 sign |
1160497 cycles |
1159979 cycles |
1.00 |
ML-DSA-87 verify |
458580 cycles |
458091 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Graviton3
Details
| Benchmark suite | Current: 1c31329 | Previous: 0f8b8e0 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
72272 cycles |
72243 cycles |
1.00 |
ML-DSA-44 sign |
213477 cycles |
213451 cycles |
1.00 |
ML-DSA-44 verify |
75713 cycles |
75744 cycles |
1.00 |
ML-DSA-65 keypair |
127604 cycles |
127603 cycles |
1.00 |
ML-DSA-65 sign |
353407 cycles |
353426 cycles |
1.00 |
ML-DSA-65 verify |
125750 cycles |
125745 cycles |
1.00 |
ML-DSA-87 keypair |
208441 cycles |
208481 cycles |
1.00 |
ML-DSA-87 sign |
452287 cycles |
452641 cycles |
1.00 |
ML-DSA-87 verify |
205856 cycles |
205909 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Graviton4 (no-opt)
Details
| Benchmark suite | Current: 1c31329 | Previous: 0f8b8e0 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
128174 cycles |
128347 cycles |
1.00 |
ML-DSA-44 sign |
448073 cycles |
448111 cycles |
1.00 |
ML-DSA-44 verify |
138265 cycles |
144871 cycles |
0.95 |
ML-DSA-65 keypair |
220367 cycles |
220834 cycles |
1.00 |
ML-DSA-65 sign |
729443 cycles |
729991 cycles |
1.00 |
ML-DSA-65 verify |
223253 cycles |
223754 cycles |
1.00 |
ML-DSA-87 keypair |
366585 cycles |
367262 cycles |
1.00 |
ML-DSA-87 sign |
928832 cycles |
929744 cycles |
1.00 |
ML-DSA-87 verify |
373916 cycles |
374445 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
AMD EPYC 4th gen (c7a) (no-opt)
Details
| Benchmark suite | Current: 1c31329 | Previous: 0f8b8e0 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
120691 cycles |
120833 cycles |
1.00 |
ML-DSA-44 sign |
449407 cycles |
449229 cycles |
1.00 |
ML-DSA-44 verify |
130264 cycles |
130297 cycles |
1.00 |
ML-DSA-65 keypair |
204729 cycles |
204649 cycles |
1.00 |
ML-DSA-65 sign |
730192 cycles |
731243 cycles |
1.00 |
ML-DSA-65 verify |
210457 cycles |
210085 cycles |
1.00 |
ML-DSA-87 keypair |
338245 cycles |
337488 cycles |
1.00 |
ML-DSA-87 sign |
925719 cycles |
929314 cycles |
1.00 |
ML-DSA-87 verify |
346809 cycles |
346839 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
f820653 to
91cddd6
Compare
d3f2de3 to
e5ad167
Compare
e5ad167 to
d278675
Compare
|
@willieyz, can you rebase this PR, please? |
d278675 to
726fc9d
Compare
|
Hello, @mkannwischer , thank you for reviewing, I had rebased it on top of the main! |
726fc9d to
1c31329
Compare
oqs-bot
left a comment
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'AMD EPYC 4th gen (c7a)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.
| Benchmark suite | Current: 1c31329 | Previous: 0f8b8e0 | Ratio |
|---|---|---|---|
ML-DSA-87 verify |
114127 cycles |
110459 cycles |
1.03 |
This comment was automatically generated by workflow using github-action-benchmark.
1c31329 to
5437035
Compare
This commit adds poly_use_hint to bench --components for benchmarking the performance impact of the changes to: - poly_use_hint_32 - poly_use_hint_88 Signed-off-by: willieyz <willie.zhao@chelpis.com>
This commit replaces the AVX2 intrinsics implementation of poly_use_hint_32 and poly_use_hint_88 with a x86_64 assembly version, this is part of the effort to enable HOL-Light proofs. Signed-off-by: willieyz <willie.zhao@chelpis.com>
5437035 to
c53c97f
Compare
mkannwischer
left a comment
There was a problem hiding this comment.
Thanks @willieyz. I made a couple of small changes to the comments (to align it with the intrinsics). Now I am happy with the changes. The performance degradation is unfortunate, but we can revisit that in a follow-up.
@hanno-becker, @jakemas, could you also take a look?
poly_use_hintwith assembly #484In this PR, we replace the AVX2 intrinsics implementation of poly_use_hint_32 and poly_use_hint_88 with a x86_64 assembly version, this is part of the effort to enable HOL-Light proofs.
We also tried unrolling the loops:
mld_poly_use_hint_88_avx2_loopandmld_poly_use_hint_32_avx2_loopin both files. However, the benchmark results showed that this did not provide any performance benefit, so we decided to keep the current version.
(avg)
(avg)
(unroll)
(avg)
(unroll)
(avg)
(unroll)
(avg)
(unroll)