Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
87 changes: 87 additions & 0 deletions .codeboarding/Core_Model.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
```mermaid

graph LR

Core_Model["Core Model"]

General_Utilities["General Utilities"]

Inference_Engine["Inference Engine"]

Core_Model -- "incorporates functions from" --> General_Utilities

Core_Model -- "feeds data to" --> Inference_Engine

click Core_Model href "https://github.com/Genentech/equifold/blob/main/.codeboarding//Core_Model.md" "Details"

```



[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org)



## Details



The Core Model component is fundamental to this project as it encapsulates the neural network architecture responsible for predicting protein structures. Its design follows the "Machine Learning Model Development and Inference" pattern by clearly separating the model's definition from other concerns.



### Core Model [[Expand]](./Core_Model.md)

This component defines the neural network architecture, including its layers, modules, and the forward pass logic. It's responsible for learning and predicting protein structures from input features. It leverages PyTorch and e3nn for building equivariant neural networks, which are crucial for handling 3D structural data.





**Related Classes/Methods**:



- `MLP` (43:43)

- `BesselBasis` (70:70)

- `RadialNN` (93:93)

- `LayerNorm` (139:139)

- `Emb` (172:172)





### General Utilities

Provides essential utility functions for calculations within the model's forward pass, such as computing structural metrics and loss functions.





**Related Classes/Methods**: _None_



### Inference Engine

Responsible for loading the trained Core Model and feeding it input data to predict protein structures.





**Related Classes/Methods**: _None_







### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq)
109 changes: 109 additions & 0 deletions .codeboarding/Data_Ingestion.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
```mermaid

graph LR

Data_Ingestion["Data Ingestion"]

openfold_light_mmcif_parsing["openfold_light.mmcif_parsing"]

openfold_light_parsers["openfold_light.parsers"]

DataHandler["DataHandler"]

CoreModel["CoreModel"]

Data_Ingestion -- "comprises" --> openfold_light_mmcif_parsing

Data_Ingestion -- "comprises" --> openfold_light_parsers

Data_Ingestion -- "provides processed data to" --> DataHandler

openfold_light_mmcif_parsing -- "provides parsed structural data to" --> DataHandler

openfold_light_parsers -- "supplies parsed sequence and alignment data to" --> DataHandler

DataHandler -- "feeds evolutionary features to" --> CoreModel

click Data_Ingestion href "https://github.com/Genentech/equifold/blob/main/.codeboarding//Data_Ingestion.md" "Details"

```



[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org)



## Details



One paragraph explaining the functionality which is represented by this graph. What the main flow is and what is its purpose.



### Data Ingestion [[Expand]](./Data_Ingestion.md)

Responsible for the initial processing of raw biological data, involving parsing various file formats to extract essential information for downstream feature generation.





**Related Classes/Methods**: _None_



### openfold_light.mmcif_parsing

Module specifically designed for parsing Macromolecular Crystallographic Information File (MMCIF) data. It handles the complex structure of MMCIF files to extract atomic coordinates, chain identifiers, and other structural details of proteins. This is fundamental for processing experimental protein structures.





**Related Classes/Methods**: _None_



### openfold_light.parsers

Module provides general parsing capabilities for sequence and alignment data formats, such as FASTA, A3M, and Stockholm. It extracts sequence information, multiple sequence alignments (MSAs), and template hit data, which are critical for generating evolutionary features for protein folding models. It also includes functionality to convert Stockholm format to A3M.





**Related Classes/Methods**: _None_



### DataHandler

Component responsible for further processing of data, such as feature generation or protein object creation, after initial ingestion.





**Related Classes/Methods**: _None_



### CoreModel

Component that receives evolutionary features for protein folding models.





**Related Classes/Methods**: _None_







### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq)
97 changes: 97 additions & 0 deletions .codeboarding/Feature_Engineering.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
```mermaid

graph LR

Feature_Engineering["Feature Engineering"]

Data_Ingestion_and_Parsing["Data Ingestion and Parsing"]

InferenceEngine["InferenceEngine"]

Data_Ingestion_and_Parsing -- "provides parsed data to" --> Feature_Engineering

Feature_Engineering -- "outputs processed features to" --> InferenceEngine

click Feature_Engineering href "https://github.com/Genentech/equifold/blob/main/.codeboarding//Feature_Engineering.md" "Details"

```



[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org)



## Details



This analysis details the 'Feature Engineering' component, primarily implemented by `openfold_light.data_pipeline`, which transforms raw data into standardized numerical features. It clarifies its dependencies on 'Data Ingestion and Parsing' for pre-processed data and its role in providing input to the 'InferenceEngine'.



### Feature Engineering [[Expand]](./Feature_Engineering.md)

This component transforms the raw data ingested by the `Data Ingestion and Parsing` module into a standardized set of numerical features suitable for the machine learning model. It generates sequence-based features, template features, protein features from structural inputs, and Multiple Sequence Alignment (MSA) features, preparing these as input tensors for the model.





**Related Classes/Methods**:



- <a href="https://github.com/genentech/equifold/blob/main/openfold_light/data_pipeline.py#L38-L62" target="_blank" rel="noopener noreferrer">`openfold_light.data_pipeline:make_template_features` (38:62)</a>

- <a href="https://github.com/genentech/equifold/blob/main/openfold_light/data_pipeline.py#L65-L84" target="_blank" rel="noopener noreferrer">`openfold_light.data_pipeline:make_sequence_features` (65:84)</a>

- <a href="https://github.com/genentech/equifold/blob/main/openfold_light/data_pipeline.py#L87-L120" target="_blank" rel="noopener noreferrer">`openfold_light.data_pipeline:make_mmcif_features` (87:120)</a>

- <a href="https://github.com/genentech/equifold/blob/main/openfold_light/data_pipeline.py#L130-L157" target="_blank" rel="noopener noreferrer">`openfold_light.data_pipeline:make_protein_features` (130:157)</a>

- <a href="https://github.com/genentech/equifold/blob/main/openfold_light/data_pipeline.py#L160-L177" target="_blank" rel="noopener noreferrer">`openfold_light.data_pipeline:make_pdb_features` (160:177)</a>

- <a href="https://github.com/genentech/equifold/blob/main/openfold_light/data_pipeline.py#L180-L213" target="_blank" rel="noopener noreferrer">`openfold_light.data_pipeline:make_msa_features` (180:213)</a>





### Data Ingestion and Parsing

Handles the initial reading and parsing of raw data formats (e.g., FASTA, A3M, mmCIF files) and provides pre-processed data structures to other components.





**Related Classes/Methods**:



- <a href="https://github.com/genentech/equifold/blob/main/openfold_light/parsers.py" target="_blank" rel="noopener noreferrer">`openfold_light.parsers`</a>

- <a href="https://github.com/genentech/equifold/blob/main/openfold_light/mmcif_parsing.py" target="_blank" rel="noopener noreferrer">`openfold_light.mmcif_parsing`</a>





### InferenceEngine

Consumes the processed feature dictionaries from the Feature Engineering component to perform model predictions or further processing.





**Related Classes/Methods**: _None_







### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq)
Loading