Skip to content

feat: implement DuckDB filesystem integration for Vortex file handling#6198

Open
iceboundrock wants to merge 10 commits intovortex-data:developfrom
Lychee-Technology:develop
Open

feat: implement DuckDB filesystem integration for Vortex file handling#6198
iceboundrock wants to merge 10 commits intovortex-data:developfrom
Lychee-Technology:develop

Conversation

@iceboundrock
Copy link

@iceboundrock iceboundrock commented Jan 29, 2026

@joseph-isaacs joseph-isaacs added the action/benchmark Trigger full benchmarks to run on this PR label Jan 29, 2026
@codspeed-hq
Copy link

codspeed-hq bot commented Jan 29, 2026

CodSpeed Performance Report

Merging this PR will not alter performance

Comparing Lychee-Technology:develop (a131078) with develop (469e31f)

Summary

✅ 1138 untouched benchmarks
⏩ 1384 skipped benchmarks1

Footnotes

  1. 1384 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@0ax1 0ax1 self-requested a review January 29, 2026 10:40
@0ax1 0ax1 added action/benchmark-sql Trigger SQL benchmarks to run on this PR changelog/feature A new feature and removed action/benchmark Trigger full benchmarks to run on this PR labels Jan 29, 2026
@joseph-isaacs joseph-isaacs added action/benchmark-sql Trigger SQL benchmarks to run on this PR and removed action/benchmark-sql Trigger SQL benchmarks to run on this PR labels Jan 29, 2026
Copy link
Contributor

@0ax1 0ax1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking the time to look into this. There's a couple of issues around thread safety / locking, error handling and memory leaks.

If you used AI to assist in writing the code for your PR please mention this in your PR description (see: https://github.com/vortex-data/vortex/blob/develop/CONTRIBUTING.md).

@joseph-isaacs joseph-isaacs added action/benchmark-sql Trigger SQL benchmarks to run on this PR and removed action/benchmark-sql Trigger SQL benchmarks to run on this PR labels Feb 2, 2026
|_| {}
);

#[derive(Clone, Copy)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a separate type (SendableClientContext) given that Send + Sync is set on ClientContext directly now?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it separates ownership/lifetime semantics from thread-safe handle passing.

ClientContext is a wrapper! type with owned state and Drop logic. It is not Copy and should not be casually moved across async boundaries or spawned tasks.

On the other hand, SendableClientContext is a tiny, Copy wrapper around the raw duckdb_vx_client_context pointer with no destructor. It’s just a handle, which is ideal for passing into async or spawn_blocking code.

vortex_bail!("GCS glob expansion not yet implemented")
}
_ => {
let paths = glob(file_glob.as_ref())
Copy link
Contributor

@0ax1 0ax1 Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we also hook into the duckdb_fs_glob for local paths? Or asked differently, should we make any assumptions at all here about globbing details of duckdb_fs_glob but rather unconditionally delegate to duckdb? Whether or not gcs is supported is now a Duckdb implementation detail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion is to just do this let file_urls = duckdb_fs_glob(..).

Copy link
Contributor

@0ax1 0ax1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there, thanks for taking the time to address the comments!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

action/benchmark-sql Trigger SQL benchmarks to run on this PR changelog/feature A new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants