feat(enrichment tables): add cuckoo filter to memory table#25143
feat(enrichment tables): add cuckoo filter to memory table#25143esensar wants to merge 5 commits intovectordotdev:masterfrom
Conversation
This adds support for cuckoo filters in memory enrichment tables, to support use cases where only presence of a key needs to be checked and false positives are acceptable, greatly improving memory usage compared to regular memory tables.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c4e16f8ff3
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4fe95ea743
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| .and_then(|p| value.get(p)) | ||
| .and_then(|v| v.as_integer()) | ||
| .and_then(|v| i32::try_from(v).ok()); | ||
| let _ = self.filter.insert_if_not_present_with_update( |
There was a problem hiding this comment.
Handle failed cuckoo inserts before acking events
handle_value discards the return value of insert_if_not_present_with_update/insert_if_not_present, so the sink marks events as delivered and emits memory_enrichment_table_insertions_total even when the filter cannot accept a key (for example once max_entries is reached without LRU). This creates silent data loss and misleading internal metrics under normal high-cardinality workloads; the insert result should be checked and failures surfaced (and counted) before acknowledging success.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
The insertion will always succeed, it may just evict another item (which will be returned, but we can't do anything useful with it), meaning insertion of that item is successful, it is just possible that some other item is now missing, which is expected from cuckoo filter.
Summary
This adds support for cuckoo filters in memory enrichment tables, to support use cases where only presence of a key needs to be checked and false positives are acceptable, greatly improving memory usage compared to regular memory tables.
Bloom filters should be fairly easy to add as well (and will be done in a separate PR), but cuckoo is a better fit, because it supports deletion, especially using cuckoo-clock lib which extends cuckoo filter with TTL and more, which fits memory enrichment tables.
Vector configuration
How did you test this PR?
Ran the above configuration and looked up the keys using
stdinsource, by entering the keys to look up. Some unit tests were added as well.Change Type
Is this a breaking change?
Does this PR include user facing changes?
no-changeloglabel to this PR.References
Notes
@vectordotdev/vectorto reach out to us regarding this PR.pre-pushhook, please see this template.make fmtmake check-clippy(if there are failures it's possible some of them can be fixed withmake clippy-fix)make testgit merge origin masterandgit push.Cargo.lock), pleaserun
make build-licensesto regenerate the license inventory and commit the changes (if any). More details on the dd-rust-license-tool.