Materialize ancestor information#2408
Conversation
Reviewer's GuideMaterializes cross-SBOM ancestor and describing-CPE data into dedicated tables, refactors CPE resolution to use those tables instead of runtime graph walks, and introduces read-only transactional endpoints and tests to ensure performant, connection-efficient analysis queries. Sequence diagram for resolve_sbom_cpes using materialized ancestorssequenceDiagram
participant Endpoint as AnalysisEndpoint
participant Service as RankService
participant DB as Database
Endpoint->>Service: resolve_sbom_cpes(cpe_search, rows, connection)
alt cpe_search == true
Service->>DB: batch_resolve_direct_cpe_matches(rows)
DB-->>Service: Vec<RankedSbom> (direct CPEs)
end
Service->>DB: batch_resolve_ancestor_cpes(sbom_ids)
Note over Service,DB: Phase 2a: direct ancestors via sbom_ancestor + sbom_describing_cpe
DB-->>Service: phase2a: Vec<AncestorCpeRow>
Note over Service,DB: Phase 2b: recursive ancestors via sbom_ancestor + sbom_describing_cpe
DB-->>Service: phase2b: Vec<AncestorCpeRow>
Service->>Service: build rows_by_sbom
Service->>Service: map phase2a + phase2b to RankedSbom (with dedup)
Service-->>Endpoint: Vec<RankedSbom>
Entity relationship diagram for materialized sbom_ancestor and describing CPEserDiagram
SBOM {
uuid sbom_id PK
}
SBOM_ANCESTOR {
uuid sbom_id PK
uuid ancestor_sbom_id PK
}
SBOM_DESCRIBING_CPE {
uuid sbom_id FK
uuid cpe_id
}
SBOM_EXTERNAL_NODE {
uuid sbom_id FK
text external_node_ref
}
SBOM_NODE_CHECKSUM {
uuid sbom_id FK
text node_id
text value
}
SBOM ||--o{ SBOM_ANCESTOR : has_ancestors
SBOM ||--o{ SBOM_DESCRIBING_CPE : has_cpes
SBOM ||--o{ SBOM_EXTERNAL_NODE : has_external_nodes
SBOM ||--o{ SBOM_NODE_CHECKSUM : has_checksums
SBOM_ANCESTOR }o--|| SBOM : ancestor_sbom
SBOM_EXTERNAL_NODE ||--o{ SBOM_NODE_CHECKSUM : matches_by_node
SBOM_NODE_CHECKSUM ||--o{ SBOM_NODE_CHECKSUM : shares_value
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
93b20d5 to
d4c51fa
Compare
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Claude <noreply@anthropic.com>
d4c51fa to
5745bdc
Compare
Traversing the nodes and chasing external documents might require loading those documents. So far, this used on DB connection each, as we've been passing in the connection pool. This may lead to pool exhaustion and the API call failing. We now pass in a read-only transaction, which will lead to the DB connection acting as a mutex, only consuming one connection. It also adds a test ensuring that's the case. Co-authored-by: Claude <noreply@anthropic.com>
5745bdc to
eb9d047
Compare
Co-authored-by: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- In
resolve_sbom_cpesthetop_ancestor_sbomfor ancestor-derived CPEs is always set toUuid::nil(), which drops information that previously came from the resolved ancestor SBOM; consider threading the actual ancestor SBOM id throughbatch_resolve_ancestor_cpesand intoRankedSbomso callers can still distinguish which SBOM provided the CPE. - The recursive ancestor CPE CTE in
batch_resolve_ancestor_cpesuses a hardcodeddepth < 10limit; it would be safer to either document this bound clearly and centralize it as a constant/config or make it configurable so future changes to graph depth expectations don’t require hunting for magic numbers in SQL strings.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `resolve_sbom_cpes` the `top_ancestor_sbom` for ancestor-derived CPEs is always set to `Uuid::nil()`, which drops information that previously came from the resolved ancestor SBOM; consider threading the actual ancestor SBOM id through `batch_resolve_ancestor_cpes` and into `RankedSbom` so callers can still distinguish which SBOM provided the CPE.
- The recursive ancestor CPE CTE in `batch_resolve_ancestor_cpes` uses a hardcoded `depth < 10` limit; it would be safer to either document this bound clearly and centralize it as a constant/config or make it configurable so future changes to graph depth expectations don’t require hunting for magic numbers in SQL strings.
## Individual Comments
### Comment 1
<location path="modules/fundamental/src/sbom/endpoints/mod.rs" line_range="95" />
<code_context>
web::Query(paginated): web::Query<Paginated>,
_: Require<ReadSbom>,
-) -> actix_web::Result<impl Responder> {
+) -> actix_web::Result<impl Responder, Error> {
let query = OwnedComponentReference::try_from(key.as_str())?;
+ let tx = db.begin_read().await?;
</code_context>
<issue_to_address>
**issue (bug_risk):** Endpoint return types use `actix_web::Result<_, Error>`, which does not match Actix's `Result` alias and likely won't compile.
`actix_web::Result` is defined as `type Result<T> = std::result::Result<T, actix_web::Error>`, so it only accepts one type parameter. Writing `actix_web::Result<impl Responder, Error>` adds a second parameter and will not compile.
This pattern shows up across many endpoints (e.g. `get_unique_licenses`, `get_license_export`, `all`, `all_related`, `count_related`, `get`, `get_sbom_advisories`, `delete`, `packages`, `related`, `upload`, `download`, and similar advisory/product/purl/vulnerability/organization/analysis handlers).
To keep using your explicit `Error` type, either:
- change signatures to `-> Result<impl Responder, Error>` (without the `actix_web::Result` alias), or
- revert to `-> actix_web::Result<impl Responder>` and rely on `Error: ResponseError` to convert into `actix_web::Error`.
As written, the mixed form will fail to type-check.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| id: web::Path<String>, | ||
| _: Require<ReadSbom>, | ||
| ) -> actix_web::Result<impl Responder> { | ||
| ) -> actix_web::Result<impl Responder, Error> { |
There was a problem hiding this comment.
issue (bug_risk): Endpoint return types use actix_web::Result<_, Error>, which does not match Actix's Result alias and likely won't compile.
actix_web::Result is defined as type Result<T> = std::result::Result<T, actix_web::Error>, so it only accepts one type parameter. Writing actix_web::Result<impl Responder, Error> adds a second parameter and will not compile.
This pattern shows up across many endpoints (e.g. get_unique_licenses, get_license_export, all, all_related, count_related, get, get_sbom_advisories, delete, packages, related, upload, download, and similar advisory/product/purl/vulnerability/organization/analysis handlers).
To keep using your explicit Error type, either:
- change signatures to
-> Result<impl Responder, Error>(without theactix_web::Resultalias), or - revert to
-> actix_web::Result<impl Responder>and rely onError: ResponseErrorto convert intoactix_web::Error.
As written, the mixed form will fail to type-check.
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Claude <noreply@anthropic.com>
| use trustify_test_context::TrustifyContext; | ||
|
|
||
| /// Compute the maximum depth of the descendant tree. | ||
| fn max_descendant_depth(nodes: &[model::Node]) -> u64 { |
rh-jfuller
left a comment
There was a problem hiding this comment.
LGTM ... there are some minor corner cases (that became apparent during review) related to dedup and cycle detection but they have always been there and I do not think impacts.
|
/scale-test |
|
🛠️ Scale test has started! Follow the progress here: Workflow Run |
|
note: scale-tests use datasets that are prob tied to main (and will not work against release branch) |
The node_map in batch_resolve_direct_cpe_matches was keyed by node_id alone, causing silent HashMap collisions when multiple SBOMs shared the same node_id string (e.g. common bom-refs like "shared-ref"). Collect all matching sbom_node entries per node_id and prefer a match from a non-source SBOM to avoid picking up the source's own external-node entry. Co-authored-by: Claude <noreply@anthropic.com>
The field is only used in tests for debugging and assertions, not by production code. Gate it behind #[cfg(test)] to avoid carrying unused data in release builds. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
actix_web::Result is just an alias for std::result::Result with a default error type. Using it with an explicit error argument is redundant. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
fa0f7f5 to
97c4294
Compare
|
The idea is to merge this onto main. Including the changes from #2407 ... I'll leave this PR open, and create a new one, targeting main with the other changes as well. |
Note: this PR builds on top of #2407, only the last few commits are relevant.It creates a table materializing the ancestor information of the RH model. Thus saving us from re-exploring that over and over again.
This brings down the "pypi request" query from days to ~13 mins:
However, this is a draft and it may break stuff. So let's be optimistic, but cautious.
Summary by Sourcery
Materialize SBOM ancestor and describing-CPE relationships to avoid expensive runtime graph walks and improve analysis query performance.
New Features:
Enhancements:
Tests:
Summary by Sourcery
Materialize cross-SBOM ancestor relationships and tighten read-only transaction handling to speed up analysis queries and avoid connection pool exhaustion.
New Features:
Enhancements:
Tests: