Quicker mzid parser by julianu · Pull Request #119 · CompOmics/psm_utils

julianu · 2025-03-17T08:43:46Z

Hej,

I re-wrote the mzident reader and made it much faster, albeit maybe a bit less complete.
For now, I added the new reader alongside the old one. It does not use Pyteomics, but parses the structure more directly. Hence, it is less complete for complicated files, but should be good for most "normal" ones originating from a single search engine and contain only one search run.
I tested the conversion to TSV on some bigger files from MS-GF+ and Comet (2-20 GB) and the output was exactly identical to the files created by the original reader. But the conversion took only about a tenth of teh time (with equal memory consumption).
Would be great, if you could add this new reader, if you like it. As the conversion of the bigger files (like a combination of TimsTOF files and proteogenomics databases) otherwise takes days :)

Cheers,
Julian

paretje · 2025-04-16T13:36:30Z

How much work would it be for you to list any limitations to your parser, especially those that are actually relevant in the context of psm_utils? It would probably be interesting to have this as part of the documentation so people can make an informed decision when selecting the parser. On top of that, it might also be interesting to see how computationally expensive it would be to implement any relevant missing features, and use your parser as the default.

julianu · 2025-04-17T14:04:51Z

I will look over it and check, what information is actually missing / could be missing.
This might take some time due to other things on my list, I will come back to you, when I am done.

julianu added 4 commits March 13, 2025 09:14

working first quick mzid version

366f732

some improvements

759ff1c

fixing linter messages

e05466a

fix to get param information of peptides

a5be04c

RalfG requested a review from paretje April 15, 2025 16:14

RalfG added the enhancement Improvement of an existing feature label Apr 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quicker mzid parser#119

Quicker mzid parser#119
julianu wants to merge 4 commits intoCompOmics:mainfrom
julianu:quick-mzid

julianu commented Mar 17, 2025

Uh oh!

paretje commented Apr 16, 2025

Uh oh!

julianu commented Apr 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

Conversation

julianu commented Mar 17, 2025

Uh oh!

paretje commented Apr 16, 2025

Uh oh!

julianu commented Apr 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants