Materialize ancestor information by ctron · Pull Request #2408 · guacsec/trustify

ctron · 2026-06-19T08:43:40Z

Note: this PR builds on top of #2407, only the last few commits are relevant.

It creates a table materializing the ancestor information of the RH model. Thus saving us from re-exploring that over and over again.

This brings down the "pypi request" query from days to ~13 mins:

➜  trustify git:(feature/atlas_perf_1_mat) time xh GET localhost:8080/api/v2/analysis/latest/component?q=purl=pkg%3Apypi%2Frequests (oidc token -Hf trusty)
HTTP/1.1 200 OK
Access-Control-Allow-Credentials: true
Access-Control-Expose-Headers: content-type
Content-Encoding: br
Content-Type: application/json
Date: Fri, 19 Jun 2026 06:49:53 GMT
Transfer-Encoding: chunked
Vary: Origin, Access-Control-Request-Method, Access-Control-Request-Headers
Vary: accept-encoding

{
    "items": [],
    "total": 0
}



________________________________________________________
Executed in  747.02 secs      fish           external
   usr time   24.16 millis    0.00 micros   24.16 millis
   sys time   20.16 millis  929.00 micros   19.23 millis

However, this is a draft and it may break stuff. So let's be optimistic, but cautious.

Summary by Sourcery

Materialize SBOM ancestor and describing-CPE relationships to avoid expensive runtime graph walks and improve analysis query performance.

New Features:

Introduce materialized sbom_describing_cpe and sbom_ancestor tables plus corresponding ORM entities to store SBOM describing CPEs and cross-SBOM ancestor links.
Populate describing CPEs and ancestor links during SBOM ingestion for CycloneDX and SPDX documents.
Add a Containerfile and .containerignore for building and packaging the trustd binary into a container image.

Enhancements:

Refactor SBOM CPE and ancestor resolution to use batched queries and the new materialized tables instead of recursive in-memory graph traversal.
Optimize product advisory and context CPE queries to read from sbom_describing_cpe, simplifying SQL and reducing join cost.
Improve logging with truncated iterable formatting and richer instrumentation for graph loading and ranking.
Add a helper to migrate the database up to a specific migration and extend the trustd db CLI to support targeted migrations.
Update several dependencies (moka, petgraph, sea-orm, tokio, uuid) and align uuid usage in the storage module.
Adjust tests and entities to reflect the new ancestor and describing-CPE materialization behavior.

Tests:

Add tests verifying sbom_describing_cpe semantics and that ingesting linked SBOMs correctly populates the sbom_ancestor table.

Summary by Sourcery

Materialize cross-SBOM ancestor relationships and tighten read-only transaction handling to speed up analysis queries and avoid connection pool exhaustion.

New Features:

Introduce a materialized sbom_ancestor table and ORM entity to store cross-SBOM ancestor links between SBOMs.
Populate sbom_ancestor entries during CycloneDX and SPDX SBOM ingestion based on checksum-linked external nodes.

Enhancements:

Refactor SBOM CPE resolution to read ancestor CPEs via sbom_ancestor and sbom_describing_cpe instead of on-the-fly graph walking and checksum-based DFS.
Centralize the begin_read helper in the common db module and adopt consistent read-only transactions across analysis, SBOM, advisory, product, purl, vulnerability, organization, and license endpoints.
Improve graph loading logging with truncated iterator formatting and minor utility cleanups (e.g., timestamp helper, error wiring).

Tests:

Add tests ensuring sbom_ancestor entries are populated for linked SBOMs and that deep descendant queries only use a single lazy pool connection.
Add a LazyPool test context and new SPDX fixtures to simulate many descendant SBOMs for pool exhaustion testing.

sourcery-ai · 2026-06-19T08:44:15Z

Reviewer's Guide

Materializes cross-SBOM ancestor and describing-CPE data into dedicated tables, refactors CPE resolution to use those tables instead of runtime graph walks, and introduces read-only transactional endpoints and tests to ensure performant, connection-efficient analysis queries.

Sequence diagram for resolve_sbom_cpes using materialized ancestors

sequenceDiagram
    participant Endpoint as AnalysisEndpoint
    participant Service as RankService
    participant DB as Database

    Endpoint->>Service: resolve_sbom_cpes(cpe_search, rows, connection)
    alt cpe_search == true
        Service->>DB: batch_resolve_direct_cpe_matches(rows)
        DB-->>Service: Vec<RankedSbom> (direct CPEs)
    end

    Service->>DB: batch_resolve_ancestor_cpes(sbom_ids)
    Note over Service,DB: Phase 2a: direct ancestors via sbom_ancestor + sbom_describing_cpe
    DB-->>Service: phase2a: Vec<AncestorCpeRow>

    Note over Service,DB: Phase 2b: recursive ancestors via sbom_ancestor + sbom_describing_cpe
    DB-->>Service: phase2b: Vec<AncestorCpeRow>

    Service->>Service: build rows_by_sbom
    Service->>Service: map phase2a + phase2b to RankedSbom (with dedup)

    Service-->>Endpoint: Vec<RankedSbom>

Entity relationship diagram for materialized sbom_ancestor and describing CPEs

erDiagram
    SBOM {
        uuid sbom_id PK
    }

    SBOM_ANCESTOR {
        uuid sbom_id PK
        uuid ancestor_sbom_id PK
    }

    SBOM_DESCRIBING_CPE {
        uuid sbom_id FK
        uuid cpe_id
    }

    SBOM_EXTERNAL_NODE {
        uuid sbom_id FK
        text external_node_ref
    }

    SBOM_NODE_CHECKSUM {
        uuid sbom_id FK
        text node_id
        text value
    }

    SBOM ||--o{ SBOM_ANCESTOR : has_ancestors
    SBOM ||--o{ SBOM_DESCRIBING_CPE : has_cpes
    SBOM ||--o{ SBOM_EXTERNAL_NODE : has_external_nodes
    SBOM ||--o{ SBOM_NODE_CHECKSUM : has_checksums

    SBOM_ANCESTOR }o--|| SBOM : ancestor_sbom
    SBOM_EXTERNAL_NODE ||--o{ SBOM_NODE_CHECKSUM : matches_by_node
    SBOM_NODE_CHECKSUM ||--o{ SBOM_NODE_CHECKSUM : shares_value

File-Level Changes

Change	Details	Files
Replace recursive ancestor/external SBOM resolution in analysis ranking with queries over a new materialized sbom_ancestor table plus existing sbom_describing_cpe data.	Remove recursive CTE-based ancestor walking, checksum-based external SBOM DFS, and their caching context from the ranking loader. Introduce AncestorCpeRow and batch_resolve_ancestor_cpes to fetch direct and recursive ancestor CPEs via sbom_ancestor and sbom_describing_cpe. Adjust resolve_sbom_cpes to run only batched direct CPE matching plus phase-2 ancestor CPE resolution, deduping recursive results against direct ones. Simplify graph loading logs to use TruncatedIter and clean up unused debug helpers and cache structs.	`modules/analysis/src/service/load/rank.rs` `modules/analysis/src/service/load/mod.rs`
Materialize cross-SBOM ancestor relationships during SBOM ingestion and expose them via new ORM entities and migrations.	Add sbom_ancestor entity model and relation from sbom to sbom_ancestor in the entity crate. Create migration m0002130_sbom_ancestor to define sbom_ancestor table, indexes, and backfill data using sbom_external_node and sbom_node_checksum. Implement SbomContext::populate_ancestors that inserts (child, ancestor) rows for the current SBOM using checksum matches around sbom_external_node references. Call populate_ancestors from both CycloneDX and SPDX SBOM ingestion paths. Add tests ensuring ingesting linked SBOMs populates sbom_ancestor with a single unidirectional child→ancestor link.	`entity/src/lib.rs` `entity/src/sbom.rs` `entity/src/sbom_ancestor.rs` `migration/src/lib.rs` `migration/src/m0002130_sbom_ancestor.rs` `modules/ingestor/src/graph/sbom/mod.rs` `modules/ingestor/src/graph/sbom/cyclonedx.rs` `modules/ingestor/src/graph/sbom/spdx.rs` `modules/analysis/src/service/load/rank.rs`
Introduce a shared DatabaseExt::begin_read helper and adopt read-only repeatable-read transactions across many HTTP endpoints.	Move DatabaseExt::begin_read trait into common db module returning DbErr and re-export from fundamental db module. Update numerous analysis, SBOM, advisory, product, purl, vulnerability, organization, license, and label endpoints to use begin_read() and return the module Error type from handlers. Wire ExternalReferenceQueryParseError and authorization errors into the fundamental Error enum and ResponseError implementation so handler signatures can use Error. Add a test using a LazyPool context to assert deep descendant queries use a single pooled connection when using read-only transactions.	`common/src/db/mod.rs` `modules/fundamental/src/db.rs` `modules/analysis/src/endpoints/mod.rs` `modules/analysis/src/endpoints/tests/mod.rs` `modules/fundamental/src/sbom/endpoints/mod.rs` `modules/fundamental/src/sbom/endpoints/query.rs` `modules/fundamental/src/error.rs` `modules/fundamental/src/advisory/endpoints/mod.rs` `modules/fundamental/src/advisory/endpoints/label.rs` `modules/fundamental/src/product/endpoints/mod.rs` `modules/fundamental/src/purl/endpoints/mod.rs` `modules/fundamental/src/purl/endpoints/base.rs` `modules/fundamental/src/sbom/endpoints/label.rs` `modules/fundamental/src/vulnerability/endpoints/mod.rs` `modules/fundamental/src/organization/endpoints/mod.rs` `modules/fundamental/src/license/endpoints/mod.rs`
Extend test infrastructure to support lazy connection pools and add data to simulate pool exhaustion scenarios.	Add LazyPool test wrapper that creates a separate Database with min_conn=0 and small max_conn and delegates teardown. Expose TrustifyContext::lazy_pool constructor to build a new Database pointing to the same Postgres instance/database with adjusted pool settings. Enable reqwest's stream feature in test-context to support new tests. Add SPDX test data under etc/test-data/spdx/pool-exhaustion/* used by deep_descendants_lazy_pool test.	`test-context/Cargo.toml` `test-context/src/ctx/mod.rs` `test-context/src/ctx/lazy_pool.rs` `test-context/src/lib.rs` `etc/test-data/spdx/pool-exhaustion/main.json` `etc/test-data/spdx/pool-exhaustion/leaf-1.json` `etc/test-data/spdx/pool-exhaustion/leaf-2.json` `etc/test-data/spdx/pool-exhaustion/leaf-3.json` `etc/test-data/spdx/pool-exhaustion/leaf-4.json` `etc/test-data/spdx/pool-exhaustion/leaf-5.json` `etc/test-data/spdx/pool-exhaustion/leaf-6.json` `etc/test-data/spdx/pool-exhaustion/leaf-7.json` `etc/test-data/spdx/pool-exhaustion/leaf-8.json` `etc/test-data/spdx/pool-exhaustion/leaf-9.json` `etc/test-data/spdx/pool-exhaustion/leaf-10.json` `etc/test-data/spdx/pool-exhaustion/leaf-11.json` `etc/test-data/spdx/pool-exhaustion/leaf-12.json` `etc/test-data/spdx/pool-exhaustion/leaf-13.json` `etc/test-data/spdx/pool-exhaustion/leaf-14.json` `etc/test-data/spdx/pool-exhaustion/leaf-15.json` `etc/test-data/spdx/pool-exhaustion/leaf-16.json` `etc/test-data/spdx/pool-exhaustion/leaf-17.json` `etc/test-data/spdx/pool-exhaustion/leaf-18.json` `etc/test-data/spdx/pool-exhaustion/leaf-19.json` `etc/test-data/spdx/pool-exhaustion/leaf-20.json`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

Co-authored-by: Claude <noreply@anthropic.com>

Traversing the nodes and chasing external documents might require loading those documents. So far, this used on DB connection each, as we've been passing in the connection pool. This may lead to pool exhaustion and the API call failing. We now pass in a read-only transaction, which will lead to the DB connection acting as a mutex, only consuming one connection. It also adds a test ensuring that's the case. Co-authored-by: Claude <noreply@anthropic.com>

Co-authored-by: Claude <noreply@anthropic.com>

sourcery-ai

Hey - I've found 1 issue, and left some high level feedback:

In resolve_sbom_cpes the top_ancestor_sbom for ancestor-derived CPEs is always set to Uuid::nil(), which drops information that previously came from the resolved ancestor SBOM; consider threading the actual ancestor SBOM id through batch_resolve_ancestor_cpes and into RankedSbom so callers can still distinguish which SBOM provided the CPE.
The recursive ancestor CPE CTE in batch_resolve_ancestor_cpes uses a hardcoded depth < 10 limit; it would be safer to either document this bound clearly and centralize it as a constant/config or make it configurable so future changes to graph depth expectations don’t require hunting for magic numbers in SQL strings.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- In `resolve_sbom_cpes` the `top_ancestor_sbom` for ancestor-derived CPEs is always set to `Uuid::nil()`, which drops information that previously came from the resolved ancestor SBOM; consider threading the actual ancestor SBOM id through `batch_resolve_ancestor_cpes` and into `RankedSbom` so callers can still distinguish which SBOM provided the CPE.
- The recursive ancestor CPE CTE in `batch_resolve_ancestor_cpes` uses a hardcoded `depth < 10` limit; it would be safer to either document this bound clearly and centralize it as a constant/config or make it configurable so future changes to graph depth expectations don’t require hunting for magic numbers in SQL strings.

## Individual Comments

### Comment 1
<location path="modules/fundamental/src/sbom/endpoints/mod.rs" line_range="95" />
<code_context>
     web::Query(paginated): web::Query<Paginated>,
     _: Require<ReadSbom>,
-) -> actix_web::Result<impl Responder> {
+) -> actix_web::Result<impl Responder, Error> {
     let query = OwnedComponentReference::try_from(key.as_str())?;
+    let tx = db.begin_read().await?;
</code_context>
<issue_to_address>
**issue (bug_risk):** Endpoint return types use `actix_web::Result<_, Error>`, which does not match Actix's `Result` alias and likely won't compile.

`actix_web::Result` is defined as `type Result<T> = std::result::Result<T, actix_web::Error>`, so it only accepts one type parameter. Writing `actix_web::Result<impl Responder, Error>` adds a second parameter and will not compile.

This pattern shows up across many endpoints (e.g. `get_unique_licenses`, `get_license_export`, `all`, `all_related`, `count_related`, `get`, `get_sbom_advisories`, `delete`, `packages`, `related`, `upload`, `download`, and similar advisory/product/purl/vulnerability/organization/analysis handlers).

To keep using your explicit `Error` type, either:
- change signatures to `-> Result<impl Responder, Error>` (without the `actix_web::Result` alias), or
- revert to `-> actix_web::Result<impl Responder>` and rely on `Error: ResponseError` to convert into `actix_web::Error`.

As written, the mixed form will fail to type-check.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2026-06-23T13:28:10Z

    id: web::Path<String>,
    _: Require<ReadSbom>,
-) -> actix_web::Result<impl Responder> {
+) -> actix_web::Result<impl Responder, Error> {


issue (bug_risk): Endpoint return types use actix_web::Result<_, Error>, which does not match Actix's Result alias and likely won't compile.

actix_web::Result is defined as type Result<T> = std::result::Result<T, actix_web::Error>, so it only accepts one type parameter. Writing actix_web::Result<impl Responder, Error> adds a second parameter and will not compile.

This pattern shows up across many endpoints (e.g. get_unique_licenses, get_license_export, all, all_related, count_related, get, get_sbom_advisories, delete, packages, related, upload, download, and similar advisory/product/purl/vulnerability/organization/analysis handlers).

To keep using your explicit Error type, either:

change signatures to -> Result<impl Responder, Error> (without the actix_web::Result alias), or

revert to -> actix_web::Result<impl Responder> and rely on Error: ResponseError to convert into actix_web::Error.

As written, the mixed form will fail to type-check.

Co-authored-by: Claude <noreply@anthropic.com>

rh-jfuller · 2026-06-23T14:07:11Z

+use trustify_test_context::TrustifyContext;
+
+/// Compute the maximum depth of the descendant tree.
+fn max_descendant_depth(nodes: &[model::Node]) -> u64 {


rh-jfuller

LGTM ... there are some minor corner cases (that became apparent during review) related to dedup and cycle detection but they have always been there and I do not think impacts.

rh-jfuller · 2026-06-24T04:23:26Z

/scale-test

github-actions · 2026-06-24T04:23:36Z

🛠️ Scale test has started! Follow the progress here: Workflow Run

rh-jfuller · 2026-06-24T05:48:11Z

note: scale-tests use datasets that are prob tied to main (and will not work against release branch)

The node_map in batch_resolve_direct_cpe_matches was keyed by node_id alone, causing silent HashMap collisions when multiple SBOMs shared the same node_id string (e.g. common bom-refs like "shared-ref"). Collect all matching sbom_node entries per node_id and prefer a match from a non-source SBOM to avoid picking up the source's own external-node entry. Co-authored-by: Claude <noreply@anthropic.com>

The field is only used in tests for debugging and assertions, not by production code. Gate it behind #[cfg(test)] to avoid carrying unused data in release builds. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

actix_web::Result is just an alias for std::result::Result with a default error type. Using it with an explicit error argument is redundant. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

ctron · 2026-06-24T08:11:31Z

The idea is to merge this onto main. Including the changes from #2407 ... I'll leave this PR open, and create a new one, targeting main with the other changes as well.

github-project-automation Bot added this to Trustify Jun 19, 2026

ctron force-pushed the feature/atlas_perf_1_mat branch 5 times, most recently from 93b20d5 to d4c51fa Compare June 23, 2026 13:01

ctron and others added 7 commits June 23, 2026 15:02

feat: materialize ancestor relationships

c9ac984

Co-authored-by: Claude <noreply@anthropic.com>

chore: fix an issue with performance on the ancestor materialization

21a4864

Co-authored-by: Claude <noreply@anthropic.com>

chore: work on ancestor materialization

f43e3c1

Co-authored-by: Claude <noreply@anthropic.com>

chore: add a limit

9204685

Co-authored-by: Claude <noreply@anthropic.com>

chore: prevent the generation of excessive log lines

85c697c

Co-authored-by: Claude <noreply@anthropic.com>

chore: reduce log noise

342eaab

chore: fix up AI mess

32ae461

Co-authored-by: Claude <noreply@anthropic.com>

ctron force-pushed the feature/atlas_perf_1_mat branch from d4c51fa to 5745bdc Compare June 23, 2026 13:02

ctron force-pushed the feature/atlas_perf_1_mat branch from 5745bdc to eb9d047 Compare June 23, 2026 13:09

chore: fix a clippy warning

0b4bd94

Co-authored-by: Claude <noreply@anthropic.com>

ctron marked this pull request as ready for review June 23, 2026 13:26

rh-jfuller self-requested a review June 23, 2026 13:27

sourcery-ai Bot reviewed Jun 23, 2026

View reviewed changes

ctron and others added 2 commits June 23, 2026 15:49

fix: ensure depth is honored for external references

e011ee4

Co-authored-by: Claude <noreply@anthropic.com>

docs: correct source code documentation

258bdd6

Co-authored-by: Claude <noreply@anthropic.com>

rh-jfuller reviewed Jun 23, 2026

View reviewed changes

rh-jfuller approved these changes Jun 23, 2026

View reviewed changes

ctron and others added 2 commits June 24, 2026 08:28

refactor(analysis): restrict top_ancestor_sbom field to test builds

1d39afe

The field is only used in tests for debugging and assertions, not by production code. Gate it behind #[cfg(test)] to avoid carrying unused data in release builds. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

refactor: replace redundant actix_web::Result with std Result

97c4294

actix_web::Result is just an alias for std::result::Result with a default error type. Using it with an explicit error argument is redundant. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

ctron force-pushed the feature/atlas_perf_1_mat branch from fa0f7f5 to 97c4294 Compare June 24, 2026 08:07

ctron marked this pull request as draft June 24, 2026 08:10

ctron mentioned this pull request Jun 24, 2026

Materialize ancestor information #2412

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Materialize ancestor information#2408

Materialize ancestor information#2408
ctron wants to merge 14 commits into
guacsec:release/0.4.zfrom
ctron:feature/atlas_perf_1_mat

ctron commented Jun 19, 2026 •

edited by sourcery-ai Bot

Loading

Uh oh!

sourcery-ai Bot commented Jun 19, 2026 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

sourcery-ai Bot Jun 23, 2026

Uh oh!

rh-jfuller Jun 23, 2026

Uh oh!

rh-jfuller left a comment

Uh oh!

rh-jfuller commented Jun 24, 2026

Uh oh!

github-actions Bot commented Jun 24, 2026

Uh oh!

rh-jfuller commented Jun 24, 2026

Uh oh!

ctron commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

ctron commented Jun 19, 2026 • edited by sourcery-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Summary by Sourcery

Uh oh!

sourcery-ai Bot commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for resolve_sbom_cpes using materialized ancestors

Entity relationship diagram for materialized sbom_ancestor and describing CPEs

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

rh-jfuller Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

rh-jfuller left a comment

Choose a reason for hiding this comment

Uh oh!

rh-jfuller commented Jun 24, 2026

Uh oh!

github-actions Bot commented Jun 24, 2026

Uh oh!

rh-jfuller commented Jun 24, 2026

Uh oh!

ctron commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ctron commented Jun 19, 2026 •

edited by sourcery-ai Bot

Loading

sourcery-ai Bot commented Jun 19, 2026 •

edited

Loading