Genentech · ivanmilevtues · Jul 4, 2025
diff --git a/.codeboarding/Core_Model.md b/.codeboarding/Core_Model.md
@@ -0,0 +1,87 @@
+```mermaid
+
+graph LR
+
+    Core_Model["Core Model"]
+
+    General_Utilities["General Utilities"]
+
+    Inference_Engine["Inference Engine"]
+
+    Core_Model -- "incorporates functions from" --> General_Utilities
+
+    Core_Model -- "feeds data to" --> Inference_Engine
+
+    click Core_Model href "https://github.com/Genentech/equifold/blob/main/.codeboarding//Core_Model.md" "Details"
+
+```
+
+
+
+[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org)
+
+
+
+## Details
+
+
+
+The Core Model component is fundamental to this project as it encapsulates the neural network architecture responsible for predicting protein structures. Its design follows the "Machine Learning Model Development and Inference" pattern by clearly separating the model's definition from other concerns.
+
+
+
+### Core Model [[Expand]](./Core_Model.md)
+
+This component defines the neural network architecture, including its layers, modules, and the forward pass logic. It's responsible for learning and predicting protein structures from input features. It leverages PyTorch and e3nn for building equivariant neural networks, which are crucial for handling 3D structural data.
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- `MLP` (43:43)
+
+- `BesselBasis` (70:70)
+
+- `RadialNN` (93:93)
+
+- `LayerNorm` (139:139)
+
+- `Emb` (172:172)
+
+
+
+
+
+### General Utilities
+
+Provides essential utility functions for calculations within the model's forward pass, such as computing structural metrics and loss functions.
+
+
+
+
+
+**Related Classes/Methods**: _None_
+
+
+
+### Inference Engine
+
+Responsible for loading the trained Core Model and feeding it input data to predict protein structures.
+
+
+
+
+
+**Related Classes/Methods**: _None_
+
+
+
+
+
+
+
+### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq)
diff --git a/.codeboarding/Data_Ingestion.md b/.codeboarding/Data_Ingestion.md
@@ -0,0 +1,109 @@
+```mermaid
+
+graph LR
+
+    Data_Ingestion["Data Ingestion"]
+
+    openfold_light_mmcif_parsing["openfold_light.mmcif_parsing"]
+
+    openfold_light_parsers["openfold_light.parsers"]
+
+    DataHandler["DataHandler"]
+
+    CoreModel["CoreModel"]
+
+    Data_Ingestion -- "comprises" --> openfold_light_mmcif_parsing
+
+    Data_Ingestion -- "comprises" --> openfold_light_parsers
+
+    Data_Ingestion -- "provides processed data to" --> DataHandler
+
+    openfold_light_mmcif_parsing -- "provides parsed structural data to" --> DataHandler
+
+    openfold_light_parsers -- "supplies parsed sequence and alignment data to" --> DataHandler
+
+    DataHandler -- "feeds evolutionary features to" --> CoreModel
+
+    click Data_Ingestion href "https://github.com/Genentech/equifold/blob/main/.codeboarding//Data_Ingestion.md" "Details"
+
+```
+
+
+
+[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org)
+
+
+
+## Details
+
+
+
+One paragraph explaining the functionality which is represented by this graph. What the main flow is and what is its purpose.
+
+
+
+### Data Ingestion [[Expand]](./Data_Ingestion.md)
+
+Responsible for the initial processing of raw biological data, involving parsing various file formats to extract essential information for downstream feature generation.
+
+
+
+
+
+**Related Classes/Methods**: _None_
+
+
+
+### openfold_light.mmcif_parsing
+
+Module specifically designed for parsing Macromolecular Crystallographic Information File (MMCIF) data. It handles the complex structure of MMCIF files to extract atomic coordinates, chain identifiers, and other structural details of proteins. This is fundamental for processing experimental protein structures.
+
+
+
+
+
+**Related Classes/Methods**: _None_
+
+
+
+### openfold_light.parsers
+
+Module provides general parsing capabilities for sequence and alignment data formats, such as FASTA, A3M, and Stockholm. It extracts sequence information, multiple sequence alignments (MSAs), and template hit data, which are critical for generating evolutionary features for protein folding models. It also includes functionality to convert Stockholm format to A3M.
+
+
+
+
+
+**Related Classes/Methods**: _None_
+
+
+
+### DataHandler
+
+Component responsible for further processing of data, such as feature generation or protein object creation, after initial ingestion.
+
+
+
+
+
+**Related Classes/Methods**: _None_
+
+
+
+### CoreModel
+
+Component that receives evolutionary features for protein folding models.
+
+
+
+
+
+**Related Classes/Methods**: _None_
+
+
+
+
+
+
+
+### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq)
diff --git a/.codeboarding/Feature_Engineering.md b/.codeboarding/Feature_Engineering.md
@@ -0,0 +1,97 @@
+```mermaid
+
+graph LR
+
+    Feature_Engineering["Feature Engineering"]
+
+    Data_Ingestion_and_Parsing["Data Ingestion and Parsing"]
+
+    InferenceEngine["InferenceEngine"]
+
+    Data_Ingestion_and_Parsing -- "provides parsed data to" --> Feature_Engineering
+
+    Feature_Engineering -- "outputs processed features to" --> InferenceEngine
+
+    click Feature_Engineering href "https://github.com/Genentech/equifold/blob/main/.codeboarding//Feature_Engineering.md" "Details"
+
+```
+
+
+
+[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org)
+
+
+
+## Details
+
+
+
+This analysis details the 'Feature Engineering' component, primarily implemented by `openfold_light.data_pipeline`, which transforms raw data into standardized numerical features. It clarifies its dependencies on 'Data Ingestion and Parsing' for pre-processed data and its role in providing input to the 'InferenceEngine'.
+
+
+
+### Feature Engineering [[Expand]](./Feature_Engineering.md)
+
+This component transforms the raw data ingested by the `Data Ingestion and Parsing` module into a standardized set of numerical features suitable for the machine learning model. It generates sequence-based features, template features, protein features from structural inputs, and Multiple Sequence Alignment (MSA) features, preparing these as input tensors for the model.
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- <a href="https://github.com/genentech/equifold/blob/main/openfold_light/data_pipeline.py#L38-L62" target="_blank" rel="noopener noreferrer">`openfold_light.data_pipeline:make_template_features` (38:62)</a>
+
+- <a href="https://github.com/genentech/equifold/blob/main/openfold_light/data_pipeline.py#L65-L84" target="_blank" rel="noopener noreferrer">`openfold_light.data_pipeline:make_sequence_features` (65:84)</a>
+
+- <a href="https://github.com/genentech/equifold/blob/main/openfold_light/data_pipeline.py#L87-L120" target="_blank" rel="noopener noreferrer">`openfold_light.data_pipeline:make_mmcif_features` (87:120)</a>
+
+- <a href="https://github.com/genentech/equifold/blob/main/openfold_light/data_pipeline.py#L130-L157" target="_blank" rel="noopener noreferrer">`openfold_light.data_pipeline:make_protein_features` (130:157)</a>
+
+- <a href="https://github.com/genentech/equifold/blob/main/openfold_light/data_pipeline.py#L160-L177" target="_blank" rel="noopener noreferrer">`openfold_light.data_pipeline:make_pdb_features` (160:177)</a>
+
+- <a href="https://github.com/genentech/equifold/blob/main/openfold_light/data_pipeline.py#L180-L213" target="_blank" rel="noopener noreferrer">`openfold_light.data_pipeline:make_msa_features` (180:213)</a>
+
+
+
+
+
+### Data Ingestion and Parsing
+
+Handles the initial reading and parsing of raw data formats (e.g., FASTA, A3M, mmCIF files) and provides pre-processed data structures to other components.
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- <a href="https://github.com/genentech/equifold/blob/main/openfold_light/parsers.py" target="_blank" rel="noopener noreferrer">`openfold_light.parsers`</a>
+
+- <a href="https://github.com/genentech/equifold/blob/main/openfold_light/mmcif_parsing.py" target="_blank" rel="noopener noreferrer">`openfold_light.mmcif_parsing`</a>
+
+
+
+
+
+### InferenceEngine
+
+Consumes the processed feature dictionaries from the Feature Engineering component to perform model predictions or further processing.
+
+
+
+
+
+**Related Classes/Methods**: _None_
+
+
+
+
+
+
+
+### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq)