Skip to content

[POC] Performance test: anti-join (200k ignored owners against 80M Amulets)#5775

Draft
julientinguely-da wants to merge 6 commits into
mainfrom
julien/poc/anti-join-amulet-ignore
Draft

[POC] Performance test: anti-join (200k ignored owners against 80M Amulets)#5775
julientinguely-da wants to merge 6 commits into
mainfrom
julien/poc/anti-join-amulet-ignore

Conversation

@julientinguely-da

@julientinguely-da julientinguely-da commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

POC

Before starting with the implementation of #5019, we wanted to verify that the proposed query scales efficiently with about 90M rows. Specifically, we isolated and tested the anti-join behavior between amulet_owners_to_ignore and dso_acs_store, as outlined in poc_parties_amulet_anti_join.sql.

Tested query

EXPLAIN ANALYZE
SELECT *
FROM dso_acs_store d
WHERE d.store_id = 5
  AND d.migration_id = 0
  AND d.package_name = 'splice-amulet'
  AND d.template_id_qualified_name = 'Splice.Amulet:Amulet'
  AND d.amulet_round_of_expiry < 900
  AND NOT EXISTS (
    SELECT 1
    FROM amulet_owners_to_ignore t
    WHERE t.party_id = d.create_arguments->>'owner'
)
ORDER BY d.amulet_round_of_expiry
LIMIT 1000; 

Results

The experiment confirms that this query is highly scalable:

  • Execution Time: ~200ms
  • Scan Method: No sequential scans are performed: it uses a Parallel Index Scan combined with a Nested Loop Anti Join.
  • Index Utilization: Successfully uses the defined indices (dso_acs_store_sid_pn_tid_croe and idx_test_owners_party_id) with 0 heap fetches.

Raw outcomes

Limit  (cost=1001.01..7428.75 rows=1000 width=1616) (actual time=35.185..180.929 rows=1000 loops=1)
  ->  Gather Merge  (cost=1001.01..180452891.99 rows=28073938 width=1616) (actual time=35.183..180.794 rows=1000 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        ->  Nested Loop Anti Join  (cost=0.99..177211462.94 rows=11697474 width=1616) (actual time=1.844..62.531 rows=397 loops=3)
              ->  Parallel Index Scan using dso_acs_store_sid_pn_tid_croe on dso_acs_store d  (cost=0.56..95011434.71 rows=23394949 width=1616) (actual time=0.612..44.004 rows=410 loops=3)
                    Index Cond: ((store_id = 5) AND (migration_id = 0) AND (package_name = 'splice-amulet'::text) AND (template_id_qualified_name = 'Splice.Amulet:Amulet'::text) AND (amulet_round_of_expiry < 900))
              ->  Index Only Scan using idx_test_owners_party_id on test_owners t  (cost=0.42..4.13 rows=1 width=63) (actual time=0.044..0.044 rows=0 loops=1230)
                    Index Cond: (party_id = (d.create_arguments ->> 'owner'::text))
                    Heap Fetches: 0
Planning Time: 3.174 ms
Execution Time: 181.113 ms
Limit  (cost=1001.01..7428.75 rows=1000 width=1616) (actual time=32.892..241.479 rows=1000 loops=1)
  ->  Gather Merge  (cost=1001.01..180452891.99 rows=28073938 width=1616) (actual time=32.891..240.889 rows=1000 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        ->  Nested Loop Anti Join  (cost=0.99..177211462.94 rows=11697474 width=1616) (actual time=1.880..83.175 rows=397 loops=3)
              ->  Parallel Index Scan using dso_acs_store_sid_pn_tid_croe on dso_acs_store d  (cost=0.56..95011434.71 rows=23394949 width=1616) (actual time=0.584..62.323 rows=410 loops=3)
                    Index Cond: ((store_id = 5) AND (migration_id = 0) AND (package_name = 'splice-amulet'::text) AND (template_id_qualified_name = 'Splice.Amulet:Amulet'::text) AND (amulet_round_of_expiry < 900))
              ->  Index Only Scan using idx_test_owners_party_id on test_owners t  (cost=0.42..4.13 rows=1 width=63) (actual time=0.048..0.048 rows=0 loops=1230)
                    Index Cond: (party_id = (d.create_arguments ->> 'owner'::text))
                    Heap Fetches: 0
Planning Time: 0.449 ms
Execution Time: 241.734 ms
Limit  (cost=1001.01..7428.75 rows=1000 width=1616) (actual time=31.204..196.814 rows=1000 loops=1)
  ->  Gather Merge  (cost=1001.01..180452803.55 rows=28073926 width=1616) (actual time=31.203..196.530 rows=1000 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        ->  Nested Loop Anti Join  (cost=0.99..177211375.88 rows=11697469 width=1616) (actual time=2.010..68.609 rows=397 loops=3)
              ->  Parallel Index Scan using dso_acs_store_sid_pn_tid_croe on dso_acs_store d  (cost=0.56..95011386.96 rows=23394938 width=1616) (actual time=0.476..47.684 rows=410 loops=3)
                    Index Cond: ((store_id = 5) AND (migration_id = 0) AND (package_name = 'splice-amulet'::text) AND (template_id_qualified_name = 'Splice.Amulet:Amulet'::text) AND (amulet_round_of_expiry < 900))
              ->  Index Only Scan using idx_test_owners_party_id on test_owners t  (cost=0.42..4.13 rows=1 width=63) (actual time=0.049..0.049 rows=0 loops=1230)
                    Index Cond: (party_id = (d.create_arguments ->> 'owner'::text))
                    Heap Fetches: 0
Planning Time: 0.350 ms
Execution Time: 197.060 ms

Pull Request Checklist

Cluster Testing

  • If a cluster test is required, comment /cluster_test on this PR to request it, and ping someone with access to the DA-internal system to approve it.
  • If a hard-migration test is required (from the latest release), comment /hdm_test on this PR to request it, and ping someone with access to the DA-internal system to approve it.
  • If a logical synchronizer upgrade test is required (from canton-3.5), comment /lsu_test on this PR to request it, and ping someone with access to the DA-internal system to approve it.

PR Guidelines

  • Include any change that might be observable by our partners or affect their deployment in the release notes.
  • Specify fixed issues with Fixes #n, and mention issues worked on using #n
  • Include a screenshot for frontend-related PRs - see README or use your favorite screenshot tool

Merge Guidelines

  • Make the git commit message look sensible when squash-merging on GitHub (most likely: just copy your PR description).

Signed-off-by: Julien Tinguely <julien.tinguely@digitalasset.com>
@julientinguely-da julientinguely-da changed the title [POC [POC] Anti-join between 100k amulet owners and dso_acs_store Jun 2, 2026
@julientinguely-da

julientinguely-da commented Jun 3, 2026

Copy link
Copy Markdown
Contributor Author

Idea to try out bloom filters if it scales badly: https://www.postgresql.org/docs/current/bloom.html

@julientinguely-da julientinguely-da changed the title [POC] Anti-join between 100k amulet owners and dso_acs_store [POC] Anti-join between 200k amulet owners and dso_acs_store with 90M rows including 80M Amulets Jun 11, 2026
@julientinguely-da julientinguely-da changed the title [POC] Anti-join between 200k amulet owners and dso_acs_store with 90M rows including 80M Amulets POC] Performance test: anti-join (200k ignored owners against 80M Amulets) Jun 11, 2026
@julientinguely-da julientinguely-da changed the title POC] Performance test: anti-join (200k ignored owners against 80M Amulets) [POC] Performance test: anti-join (200k ignored owners against 80M Amulets) Jun 11, 2026

@moritzkiefer-da moritzkiefer-da left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx looks good to me

'sv', 'sv_' || lpad((row_num % 50)::text, 3, '0') || '::1220' || md5('sv_' || (row_num % 50)::text),
'round', (row_num * 11 + 3) % 3000,
'weight', 1 + (row_num % 20),
'beneficiary', 'user_' || lpad((row_num % 10000)::text, 5, '0') || '__wallet__user::1220' || md5('owner_' || (row_num % 10000)::text)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we only do 10000 here? 10k parties seems not very much. Are you assuming that the number of reward recipients is much smaller than the number of coin holders? Maybe not that unreasonable but at least for app rewards 10k seems very low.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you assuming that the number of reward recipients is much smaller than the number of coin holders?

yep

Maybe not that unreasonable but at least for app rewards 10k seems very low.

Only tested against Amulet here, but yes makes sense


-- 4. create index and redo anti-join in point 4.

CREATE INDEX idx_dso_acs_pn_tid_croe_owner

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you expect that we keep the index on the JSON field? Usually our indices work by creating a separate column and indexing on that.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The query plan uses idx_dso_acs_pn_tid_croe instead of this one, cheaper apparently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants