-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
perf (tsort) : avoid reading the whole input into memory and intern strings #9872
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
CodSpeed Performance ReportMerging #9872 will degrade performance by 5.29%Comparing Summary
Benchmarks breakdown
Footnotes
|
|
GNU testsuite comparison: |
81ece9b to
0a56fe6
Compare
|
GNU testsuite comparison: |
|
GNU testsuite comparison: |
|
GNU testsuite comparison: |
ddb2361 to
b98950c
Compare
|
GNU testsuite comparison: |
|
GNU testsuite comparison: |
|
any idea why the perf regressed here? |
|
Oh, @sylvestre , I just added a few changes :) |
|
GNU testsuite comparison: |
|
GNU testsuite comparison: |
|
@sylvestre I believe the regression noticed are due to cache misses because the metadata that goes with the token is just slightly bigger than usize. Here's another benchmark that maybe of interest: graph.tsort, 1 Gb random graph
Massif output:
Gnu tsort:
MB (Memory)
248.6^ #
| ::::::::::@::#
| ::::::::::::::::::: ::: ::::@: #
| :::::@:::: ::::: : ::::: ::: ::: ::::@: #
| @@:: : :@:: : ::::: : ::::: ::: ::: ::::@: #
| ::::@ :: : :@:: : ::::: : ::::: ::: ::: ::::@: #
| @@::: :@ :: : :@:: : ::::: : ::::: ::: ::: ::::@: #
| ::@ ::: :@ :: : :@:: : ::::: : ::::: ::: ::: ::::@: #
| ::::::@ ::: :@ :: : :@:: : ::::: : ::::: ::: ::: ::::@: #
| :::: ::@ ::: :@ :: : :@:: : ::::: : ::::: ::: ::: ::::@: #
| :::::: ::@ ::: :@ :: : :@:: : ::::: : ::::: ::: ::: ::::@: #
| :::: :::: ::@ ::: :@ :: : :@:: : ::::: : ::::: ::: ::: ::::@: #
| :::: : :::: ::@ ::: :@ :: : :@:: : ::::: : ::::: ::: ::: ::::@: #
| :: :: : :::: ::@ ::: :@ :: : :@:: : ::::: : ::::: ::: ::: ::::@: #
| :::: :: : :::: ::@ ::: :@ :: : :@:: : ::::: : ::::: ::: ::: ::::@: #
| : :: :: : :::: ::@ ::: :@ :: : :@:: : ::::: : ::::: ::: ::: ::::@: #
| :: :: :: : :::: ::@ ::: :@ :: : :@:: : ::::: : ::::: ::: ::: ::::@: #
| :: :: :: : :::: ::@ ::: :@ :: : :@:: : ::::: : ::::: ::: ::: ::::@: #
| @@:: :: :: : :::: ::@ ::: :@ :: : :@:: : ::::: : ::::: ::: ::: ::::@: #
| @ :: :: :: : :::: ::@ ::: :@ :: : :@:: : ::::: : ::::: ::: ::: ::::@: #
0 +----------------------------------------------------------------------->Gi (Giga Instructions)
0 23.47
Number of snapshots: 56
Detailed snapshots: [1, 2, 16, 21, 26, 52, 55 (peak)]
uutils tsort
MB (Memory)
200.0^ #
| @::::@:::::::::::@::::@::::@::::@:::#:
| :::::::::::@::::::@::::@::: :::::: @::::@::::@::::@:::#:
| ::::::::::::::@:::: :@::::@::: :::::: @::::@::::@::::@:::#:
| :::: :::::::::::@:::: :@::::@::: :::::: @::::@::::@::::@:::#::
| ::::: :::::::::::@:::: :@::::@::: :::::: @::::@::::@::::@:::#::
| @:::::: :::::::::::@:::: :@::::@::: :::::: @::::@::::@::::@:::#:::
| :@:::::: :::::::::::@:::: :@::::@::: :::::: @::::@::::@::::@:::#:::
| :@:::::: :::::::::::@:::: :@::::@::: :::::: @::::@::::@::::@:::#:::
| :@:::::: :::::::::::@:::: :@::::@::: :::::: @::::@::::@::::@:::#:::
| :@:::::: :::::::::::@:::: :@::::@::: :::::: @::::@::::@::::@:::#:::
| :@:::::: :::::::::::@:::: :@::::@::: :::::: @::::@::::@::::@:::#:::
| @::@:::::: :::::::::::@:::: :@::::@::: :::::: @::::@::::@::::@:::#:::
| :@::@:::::: :::::::::::@:::: :@::::@::: :::::: @::::@::::@::::@:::#:::
| :@::@:::::: :::::::::::@:::: :@::::@::: :::::: @::::@::::@::::@:::#:::
| :@::@:::::: :::::::::::@:::: :@::::@::: :::::: @::::@::::@::::@:::#:::
| :@::@:::::: :::::::::::@:::: :@::::@::: :::::: @::::@::::@::::@:::#:::
| ::@::@:::::: :::::::::::@:::: :@::::@::: :::::: @::::@::::@::::@:::#:::
| ::@::@:::::: :::::::::::@:::: :@::::@::: :::::: @::::@::::@::::@:::#:::
| ::@::@:::::: :::::::::::@:::: :@::::@::: :::::: @::::@::::@::::@:::#:::
0 +----------------------------------------------------------------------->Gi (Giga Instructions)
0 35.83
Number of snapshots: 98
Detailed snapshots: [3, 6, 24, 31, 37, 50, 51, 52, 62, 72, 82, 91 (peak)]
Hyperfine :
Benchmark 1: target/release/coreutils tsort /home/bench_vm/graph.tsort
Time (mean ± σ): 6.044 s ± 0.071 s [User: 5.602 s, System: 0.394 s]
Range (min … max): 5.974 s … 6.168 s 10 runs
Benchmark 2: tsort /home/bench_vm/graph.tsort
Time (mean ± σ): 7.115 s ± 0.147 s [User: 6.738 s, System: 0.328 s]
Range (min … max): 6.910 s … 7.390 s 10 runs
`
Summary: 'target/release/coreutils tsort /home/bench_vm/graph.tsort'
ran 1.18 ± 0.03 times faster than 'tsort /home/bench_vm/graph.tsort'
` |
|
GNU testsuite comparison: |
|
GNU testsuite comparison: |
7e7fa49 to
8c49b1a
Compare
|
GNU testsuite comparison: |
756b865 to
b9604fe
Compare
|
GNU testsuite comparison: |
8ac2d2a to
51b542f
Compare
|
windows fails with: |
3cd87f8 to
a8a1860
Compare
|
@sylvestre fixed, the rest of the work will be put in another PR. |
|
GNU testsuite comparison: |
1 similar comment
|
GNU testsuite comparison: |
|
GNU testsuite comparison: |
|
@sylvestre done :) |
|
The performance regression is inevitable here if we want to support the same input range (2^64 -1 unique input tokens). There's also a slight buffering cost at the given bench graph size. We are still way more memory efficient, so I'd still consider it a net positive. |
This PR is a WIP to get more performance out of tsort.
Move to usize is complete. Next step is to move to vec instead of hashmap