Skip to content

Feat: chacha20 c impl. incl. current mlir output#886

Draft
salinhkuhn wants to merge 3 commits into
mainfrom
feat-chacha20
Draft

Feat: chacha20 c impl. incl. current mlir output#886
salinhkuhn wants to merge 3 commits into
mainfrom
feat-chacha20

Conversation

@salinhkuhn

Copy link
Copy Markdown

This PR adds the test case for ChaCha20 (RFC 8439) similar in spirit to fastntt.c.
The implementation is structured in three inlined layers ( quarter_round, chacha20_block, chacha20_xor -> following RFC 8439 and cross-checked against the reference ph4r05/py-chacha20poly1305 chacha.py.

Still a draft since I am looking for direction before I finalize (full lower to riscv?, how to create the generic CHECK layout automatically?)

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VeIR Benchmarks

Details
Benchmark suite Current: 3f64d75 Previous: 936c95c Ratio
add-fold-worklist/create 2288000 ns (± 83096) 2348000 ns (± 97457) 0.97
add-fold-worklist/rewrite 3903000 ns (± 138156) 4141000 ns (± 38823) 0.94
add-fold-worklist-local/create 2172000 ns (± 78684) 2291000 ns (± 44019) 0.95
add-fold-worklist-local/rewrite 3334000 ns (± 7603) 3404000 ns (± 88004) 0.98
add-zero-worklist/create 2205000 ns (± 54765) 2372500 ns (± 138531) 0.93
add-zero-worklist/rewrite 2692000 ns (± 43195) 2552000 ns (± 62280) 1.05
add-zero-reuse-worklist/create 1929000 ns (± 71279) 2026500 ns (± 127621) 0.95
add-zero-reuse-worklist/rewrite 2173000 ns (± 32886) 2122000 ns (± 69455) 1.02
mul-two-worklist/create 2231000 ns (± 102175) 2292000 ns (± 111491) 0.97
mul-two-worklist/rewrite 5698000 ns (± 126431) 5580000 ns (± 86102) 1.02
add-fold-forwards/create 2238000 ns (± 107297) 2422000 ns (± 118219) 0.92
add-fold-forwards/rewrite 2973000 ns (± 69651) 2986000 ns (± 45152) 1.00
add-zero-forwards/create 2281000 ns (± 112412) 2488000 ns (± 113288) 0.92
add-zero-forwards/rewrite 1978000 ns (± 29985) 1997000 ns (± 35374) 0.99
add-zero-reuse-forwards/create 1879000 ns (± 90814) 1865000 ns (± 10756) 1.01
add-zero-reuse-forwards/rewrite 1541000 ns (± 48004) 1537000 ns (± 16432) 1.00
mul-two-forwards/create 2149000 ns (± 82461) 2287000 ns (± 42759) 0.94
mul-two-forwards/rewrite 3695000 ns (± 87191) 3584000 ns (± 30971) 1.03
add-zero-reuse-first/create 1894500 ns (± 112455) 1983000 ns (± 107910) 0.96
add-zero-reuse-first/rewrite 8000 ns (± 1584) 8000 ns (± 1840) 1
add-zero-lots-of-reuse-first/create 1956000 ns (± 90884) 1920500 ns (± 103233) 1.02
add-zero-lots-of-reuse-first/rewrite 828000 ns (± 20132) 776000 ns (± 30418) 1.07

This comment was automatically generated by workflow using github-action-benchmark.

Comment thread Test/Vcc/chacha20.c
@luisacicolini

luisacicolini commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Thank you for the PR :)

could you look at #735?
Ideally, we would like to have a c file compiled with vcc and add the result of this compilation (which should be LLVM IR) also to the LLVM test cases, so we can test it with veir-opt :) and if some ops are not parsed correctly please do open an issue!

Comment thread Test/Vcc/chacha20.c Outdated
Comment thread Test/Vcc/chacha20.c Outdated
return ((int)ciphertext[0] << 24) | ((int)ciphertext[1] << 16) | ((int)ciphertext[2] << 8) | (int)ciphertext[3];
}

// CHECK: Program output: #[0x6e2e359a#32] (CURRENTLY NOT USED BUT WOULD BE THE VALUE TO CHECK AGAINST THE RFC EXPECTED OUTPUT WHEN INTERPRETATION WORKS) No newline at end of file

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to check the interpretation we can also add a test in test/interpreter :)

Comment thread Test/Vcc/chacha20.c
}
}

int main() {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you remove the main? I suspect it might be the reason we still have alloca in the mlir (similarly to what happened for fastntt). It is already enough to have an instruction-selected kernel for the algorithm (see #890 for reference)

@luisacicolini

Copy link
Copy Markdown
Contributor

Thank you! Left one more comment. After that is addressed, we should check whether we can instruction-select it (as we did in #890 for fastntt). Also, could you remove the .md empty file? :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants