Feat: chacha20 c impl. incl. current mlir output#886
Conversation
There was a problem hiding this comment.
VeIR Benchmarks
Details
| Benchmark suite | Current: 3f64d75 | Previous: 936c95c | Ratio |
|---|---|---|---|
add-fold-worklist/create |
2288000 ns (± 83096) |
2348000 ns (± 97457) |
0.97 |
add-fold-worklist/rewrite |
3903000 ns (± 138156) |
4141000 ns (± 38823) |
0.94 |
add-fold-worklist-local/create |
2172000 ns (± 78684) |
2291000 ns (± 44019) |
0.95 |
add-fold-worklist-local/rewrite |
3334000 ns (± 7603) |
3404000 ns (± 88004) |
0.98 |
add-zero-worklist/create |
2205000 ns (± 54765) |
2372500 ns (± 138531) |
0.93 |
add-zero-worklist/rewrite |
2692000 ns (± 43195) |
2552000 ns (± 62280) |
1.05 |
add-zero-reuse-worklist/create |
1929000 ns (± 71279) |
2026500 ns (± 127621) |
0.95 |
add-zero-reuse-worklist/rewrite |
2173000 ns (± 32886) |
2122000 ns (± 69455) |
1.02 |
mul-two-worklist/create |
2231000 ns (± 102175) |
2292000 ns (± 111491) |
0.97 |
mul-two-worklist/rewrite |
5698000 ns (± 126431) |
5580000 ns (± 86102) |
1.02 |
add-fold-forwards/create |
2238000 ns (± 107297) |
2422000 ns (± 118219) |
0.92 |
add-fold-forwards/rewrite |
2973000 ns (± 69651) |
2986000 ns (± 45152) |
1.00 |
add-zero-forwards/create |
2281000 ns (± 112412) |
2488000 ns (± 113288) |
0.92 |
add-zero-forwards/rewrite |
1978000 ns (± 29985) |
1997000 ns (± 35374) |
0.99 |
add-zero-reuse-forwards/create |
1879000 ns (± 90814) |
1865000 ns (± 10756) |
1.01 |
add-zero-reuse-forwards/rewrite |
1541000 ns (± 48004) |
1537000 ns (± 16432) |
1.00 |
mul-two-forwards/create |
2149000 ns (± 82461) |
2287000 ns (± 42759) |
0.94 |
mul-two-forwards/rewrite |
3695000 ns (± 87191) |
3584000 ns (± 30971) |
1.03 |
add-zero-reuse-first/create |
1894500 ns (± 112455) |
1983000 ns (± 107910) |
0.96 |
add-zero-reuse-first/rewrite |
8000 ns (± 1584) |
8000 ns (± 1840) |
1 |
add-zero-lots-of-reuse-first/create |
1956000 ns (± 90884) |
1920500 ns (± 103233) |
1.02 |
add-zero-lots-of-reuse-first/rewrite |
828000 ns (± 20132) |
776000 ns (± 30418) |
1.07 |
This comment was automatically generated by workflow using github-action-benchmark.
|
Thank you for the PR :) could you look at #735? |
| return ((int)ciphertext[0] << 24) | ((int)ciphertext[1] << 16) | ((int)ciphertext[2] << 8) | (int)ciphertext[3]; | ||
| } | ||
|
|
||
| // CHECK: Program output: #[0x6e2e359a#32] (CURRENTLY NOT USED BUT WOULD BE THE VALUE TO CHECK AGAINST THE RFC EXPECTED OUTPUT WHEN INTERPRETATION WORKS) No newline at end of file |
There was a problem hiding this comment.
to check the interpretation we can also add a test in test/interpreter :)
…t since working on 14 byte msg
| } | ||
| } | ||
|
|
||
| int main() { |
There was a problem hiding this comment.
could you remove the main? I suspect it might be the reason we still have alloca in the mlir (similarly to what happened for fastntt). It is already enough to have an instruction-selected kernel for the algorithm (see #890 for reference)
|
Thank you! Left one more comment. After that is addressed, we should check whether we can instruction-select it (as we did in #890 for fastntt). Also, could you remove the |
This PR adds the test case for ChaCha20 (RFC 8439) similar in spirit to fastntt.c.
The implementation is structured in three inlined layers ( quarter_round, chacha20_block, chacha20_xor -> following RFC 8439 and cross-checked against the reference ph4r05/py-chacha20poly1305 chacha.py.
Still a draft since I am looking for direction before I finalize (full lower to riscv?, how to create the generic CHECK layout automatically?)