WIP: Add MSA homologue search and RAG-augmeneted E1 embedding/scoring pipelines#25
WIP: Add MSA homologue search and RAG-augmeneted E1 embedding/scoring pipelines#25nrafaili wants to merge 2 commits intoSynthyra:mainfrom
Conversation
|
Hi @nrafaili , Thanks for the PR! Just for clarity, in order to be within the goals behind this project, the entire model should be a single object that's loaded via auto model in HuggingFace. These options to import from E1 FastPLMs aren't going to be merged and are not supported because it's the expectation that users do not have to clone this repository to use the classes. We are open to solutions for RAG and homolog search; that sounds great, but it has to exist within the main class inherited in the E1 model. These have to be easy-to-use functions within the base class, like PPLL and embed, etc. Also, because the embed mixin is already inherited, the .embed function needs to have a different name if it's going to function differently than the base sort of natural last hidden state to pooling workflow. |
Summary
homologue_search.py: MMseqs2 via Docker OR ColabFold API homologue retrieval modulesrag_e1.py: Retrieval-augmented prediction with E1 models using MSA-derived context:e1_utils.py: Unmodified contents ofio.py,msa_sampling.py, andpredictor.pyfrom the E1 repositoryUsage
Homologue search
RAG-augmented scoring and embedding
Test plan