Add streaming read/write API for Sandbox#135
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
Pull request overview
This PR adds a streaming file I/O API to the unstable Sandbox filesystem via box.fs.open(...), enabling chunked reads/writes for binary and text data (async + sync), and switches uploads to a bounded-memory tar+gzip streaming encoder with transfer validation and configurable request timeouts.
Changes:
- Introduces concrete streaming file-handle types (async + sync) with context-manager semantics and
open()overloads. - Implements streaming tar+gzip archive generation for uploads plus size/mode/chunk validation and new transfer-focused error types.
- Adds unit/live tests and updates unstable docs + example demonstrating streaming transfers.
Reviewed changes
Copilot reviewed 21 out of 21 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unstable/test_sandbox_transfer_validation.py | Adds tests for transfer validation helpers, exported handle/error types, and timeout defaulting. |
| tests/unstable/test_sandbox_streaming_archive.py | Adds coverage for the bounded-memory tar+gzip encoder and async/sync archive iterators. |
| tests/unstable/test_sandbox_filesystem.py | Extends filesystem tests for streaming read/write behavior, response closing, and newline buffering. |
| tests/unstable/test_sandbox_api_client.py | Updates API client construction to include the new file-transfer timeout parameter. |
| tests/live/test_unstable_sandbox_live.py | Adds a live scenario test validating streaming upload/download parity across async/sync drivers. |
| tests/live/_unstable_scenarios.py | Implements streaming transfer scenario helpers using the new fs.open API. |
| src/vercel/unstable/sandbox/sync.py | Exposes new streaming handle types and transfer errors in the sync public unstable API. |
| src/vercel/unstable/sandbox/init.py | Exposes new streaming handle types and transfer errors in the async public unstable API. |
| src/vercel/unstable/README.md | Documents the new open() streaming API and provides examples. |
| src/vercel/_internal/unstable/sandbox/sync_runtime.py | Adds sync fs.open, switches reads to streamed responses, and writes to streamed archive uploads. |
| src/vercel/_internal/unstable/sandbox/sync_filesystem_handle.py | Implements sync streaming reader/writer handle classes and publish plumbing. |
| src/vercel/_internal/unstable/sandbox/streaming_archive.py | Adds bounded-memory tar+gzip streaming encoder and async/sync body generators with size checks. |
| src/vercel/_internal/unstable/sandbox/service.py | Adds write_archive + open_read_response service methods and wires file-transfer timeouts into the API client. |
| src/vercel/_internal/unstable/sandbox/runtime_common.py | Introduces upload entry model + tar-path normalization and validation helpers for sizes/modes/chunks. |
| src/vercel/_internal/unstable/sandbox/options.py | Adds file_transfer_timeout to service options with a 5-minute default. |
| src/vercel/_internal/unstable/sandbox/filesystem_handle_common.py | Adds shared handle helpers (option validation, read-size validation, text codecs). |
| src/vercel/_internal/unstable/sandbox/errors.py | Adds transfer-focused filesystem error types (size mismatch, transfer base error). |
| src/vercel/_internal/unstable/sandbox/async_runtime.py | Adds async fs.open, switches reads to streamed responses, and writes to streamed archive uploads. |
| src/vercel/_internal/unstable/sandbox/async_filesystem_handle.py | Implements async streaming reader/writer handle classes and publish plumbing. |
| src/vercel/_internal/unstable/sandbox/api_client.py | Adds streamed read responses, streamed upload bodies, and request timeouts for file transfers. |
| examples/unstable/sandbox_06_streaming_files.py | Adds an end-to-end example demonstrating streaming upload and download. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def _validate_read_size(size: int) -> None: | ||
| if not isinstance(size, int): | ||
| raise TypeError("size must be an integer") | ||
| if size < -1: | ||
| raise ValueError("size must be -1 or non-negative") |
There was a problem hiding this comment.
Counterpoint: io.BytesIO(b"abc").read(True) == b"a". I'm going to allow bool for now unless maybe @elprans wants us to be more strict here too.
| else: | ||
| assert self._send is not None | ||
| try: | ||
| await self._send.send(_EOF) |
There was a problem hiding this comment.
I think you're missing a check on whether the write buffer was actually fully written (_written == _size`)?
There was a problem hiding this comment.
Good catch, added that. While I was in there, it started to feel like the whole spooled vs direct split was getting a little messy, so I split those implementations up. Eventually, once we have a direct stream endpoint, we can gut the sized implementation down to the bare minimum, or just remove/deprecate it if we decide to do that.
Reduces the amount of branching and makes it easier to later remove or reduce the functionality of the explicit-sized branch once we have a direct streaming backend.
9d0d664 to
3c19bbf
Compare
This adds streaming file access to the unstable Sandbox filesystem API through
box.fs.open. Both async and sync runtimes support reading and writing binary or text data without buffering entire files in memory. The implementation includes bounded-memory tar/gzip encoding for uploads, transfer validation, configurable timeouts and chunk sizes, and concrete file-handle types with familiar context-manager semantics.Note: since we currently must know the full size of the file before sending the data to the backend, we spool the contents to a
TemporaryFile, read the size, and then stream the data from that temp file to the backend. If you know the size of the content, you can skip this spooling by providing the exact byte size inopen(..., size=byte_size)which will stream directly to the server. We hope this requirement will go away soon, so we're mostly documenting the size-less API for now.