Skip to content

feat(theta): introduce intersection theta set operation#100

Open
ZENOTME wants to merge 11 commits intoapache:mainfrom
ZENOTME:refine_theta_sketch
Open

feat(theta): introduce intersection theta set operation#100
ZENOTME wants to merge 11 commits intoapache:mainfrom
ZENOTME:refine_theta_sketch

Conversation

@ZENOTME
Copy link
Contributor

@ZENOTME ZENOTME commented Feb 19, 2026

This PR:

  • introduce ThetaSketchView trait and enhance ThetaHashTable used for set operation
  • introduce intersection theta set operation

@ZENOTME ZENOTME force-pushed the refine_theta_sketch branch 3 times, most recently from bb8b9fd to 9f17971 Compare February 19, 2026 12:53
@ZENOTME
Copy link
Contributor Author

ZENOTME commented Feb 19, 2026

cc @tisonkun @PsiACE @notfilippo

@ZENOTME ZENOTME force-pushed the refine_theta_sketch branch from 9f17971 to 054107c Compare February 19, 2026 12:54
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the ThetaHashTable and introduces the ThetaSketchView trait to prepare for implementing set operations (union, intersection, difference) on theta sketches. The changes improve the API design by separating hash computation from insertion logic and clarifying the semantics of emptiness.

Changes:

  • Introduced ThetaSketchView trait providing a unified read-only interface for both mutable ThetaSketch and immutable CompactThetaSketch
  • Refactored ThetaHashTable to separate hash computation (hash) from insertion with theta screening (try_insert_hash)
  • Changed is_empty semantics to track logical emptiness (whether updates were attempted) rather than physical emptiness (whether entries exist)
  • Renamed num_entries to num_retained for clarity
  • Added new_with_state constructor and from_parts factory method for creating tables/sketches with explicit state

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
datasketches/src/theta/sketch.rs Introduced ThetaSketchView trait, implemented it for ThetaSketch and CompactThetaSketch, added seed_hash() method, refactored update() to use try_insert(), added from_parts() constructor
datasketches/src/theta/mod.rs Exported ThetaSketchView trait
datasketches/src/theta/hash_table.rs Split hash_and_screen into hash() and try_insert_hash(), added is_empty field for logical emptiness, renamed num_entries to num_retained, added new_with_state() constructor, made REBUILD_THRESHOLD pub(crate), updated all tests

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

ordered,
empty,
}
CompactThetaSketch::from_parts(entries, theta, self.table.seed_hash(), ordered, empty)
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The refactoring to use from_parts is good, but note that the empty parameter passed to from_parts is computed from entries.is_empty() (line 239), which checks if there are retained hashes. However, with the new is_empty semantics in ThetaHashTable that tracks logical emptiness (whether any updates were attempted), there's now a potential inconsistency: if a ThetaSketch has been updated but all values were screened out, self.table.is_empty() would return false, but the CompactThetaSketch created here would have empty = true. This may be intentional for CompactThetaSketch, but it creates a semantic difference between sketch.is_empty() and sketch.compact(false).is_empty().

Copilot uses AI. Check for mistakes.
Copy link
Member

@tisonkun tisonkun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest you directly implement all the requirements (set ops).

These abstractions look intermediate - I can't judge whether they are correct isolated.

@ZENOTME ZENOTME changed the title refactor(theta): introduce ThetaSketchView trait and enhance ThetaHashTable used for set operation feat(theta): introduce intersection theta set operation Feb 22, 2026
@ZENOTME
Copy link
Contributor Author

ZENOTME commented Feb 22, 2026

I suggest you directly implement all the requirements (set ops).

These abstractions look intermediate - I can't judge whether they are correct isolated.

Have push commit for intersection op

ZENOTME and others added 6 commits February 25, 2026 09:28
Signed-off-by: tison <wander4096@gmail.com>
Signed-off-by: tison <wander4096@gmail.com>
Signed-off-by: tison <wander4096@gmail.com>
Copy link
Member

@tisonkun tisonkun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution @ZENOTME! LGTM.

Add commits to seal the ThetaSketchView trait.

Signed-off-by: tison <wander4096@gmail.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 7 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants