Skip to content

Add streaming read/write API for Sandbox#135

Open
scotttrinh wants to merge 7 commits into
unstable/persistent-sandboxfrom
unstable/sandbox-fs-streaming
Open

Add streaming read/write API for Sandbox#135
scotttrinh wants to merge 7 commits into
unstable/persistent-sandboxfrom
unstable/sandbox-fs-streaming

Conversation

@scotttrinh

Copy link
Copy Markdown
Collaborator

This adds streaming file access to the unstable Sandbox filesystem API through box.fs.open. Both async and sync runtimes support reading and writing binary or text data without buffering entire files in memory. The implementation includes bounded-memory tar/gzip encoding for uploads, transfer validation, configurable timeouts and chunk sizes, and concrete file-handle types with familiar context-manager semantics.

async with sandbox.create_sandbox(
    name="streaming-files-demo",
    runtime="python3.13",
) as box:
    async with (
        await anyio.open_file("input.bin", "rb") as local,
        box.fs.open("workspace/input.bin", "wb") as remote,
    ):
        while chunk := await local.read(64 * 1024):
            await remote.write(chunk)

    async with (
        box.fs.open("workspace/input.bin", "rb") as remote,
        await anyio.open_file("output.bin", "wb") as local,
    ):
        while chunk := await remote.read(64 * 1024):
            await local.write(chunk)

Note: since we currently must know the full size of the file before sending the data to the backend, we spool the contents to a TemporaryFile, read the size, and then stream the data from that temp file to the backend. If you know the size of the content, you can skip this spooling by providing the exact byte size in open(..., size=byte_size) which will stream directly to the server. We hope this requirement will go away soon, so we're mostly documenting the size-less API for now.

@scotttrinh scotttrinh requested review from Copilot, elprans and fantix June 22, 2026 17:12
@vercel

vercel Bot commented Jun 22, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
vercel-py Ready Ready Preview Jun 23, 2026 1:58pm

Request Review

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a streaming file I/O API to the unstable Sandbox filesystem via box.fs.open(...), enabling chunked reads/writes for binary and text data (async + sync), and switches uploads to a bounded-memory tar+gzip streaming encoder with transfer validation and configurable request timeouts.

Changes:

  • Introduces concrete streaming file-handle types (async + sync) with context-manager semantics and open() overloads.
  • Implements streaming tar+gzip archive generation for uploads plus size/mode/chunk validation and new transfer-focused error types.
  • Adds unit/live tests and updates unstable docs + example demonstrating streaming transfers.

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/unstable/test_sandbox_transfer_validation.py Adds tests for transfer validation helpers, exported handle/error types, and timeout defaulting.
tests/unstable/test_sandbox_streaming_archive.py Adds coverage for the bounded-memory tar+gzip encoder and async/sync archive iterators.
tests/unstable/test_sandbox_filesystem.py Extends filesystem tests for streaming read/write behavior, response closing, and newline buffering.
tests/unstable/test_sandbox_api_client.py Updates API client construction to include the new file-transfer timeout parameter.
tests/live/test_unstable_sandbox_live.py Adds a live scenario test validating streaming upload/download parity across async/sync drivers.
tests/live/_unstable_scenarios.py Implements streaming transfer scenario helpers using the new fs.open API.
src/vercel/unstable/sandbox/sync.py Exposes new streaming handle types and transfer errors in the sync public unstable API.
src/vercel/unstable/sandbox/init.py Exposes new streaming handle types and transfer errors in the async public unstable API.
src/vercel/unstable/README.md Documents the new open() streaming API and provides examples.
src/vercel/_internal/unstable/sandbox/sync_runtime.py Adds sync fs.open, switches reads to streamed responses, and writes to streamed archive uploads.
src/vercel/_internal/unstable/sandbox/sync_filesystem_handle.py Implements sync streaming reader/writer handle classes and publish plumbing.
src/vercel/_internal/unstable/sandbox/streaming_archive.py Adds bounded-memory tar+gzip streaming encoder and async/sync body generators with size checks.
src/vercel/_internal/unstable/sandbox/service.py Adds write_archive + open_read_response service methods and wires file-transfer timeouts into the API client.
src/vercel/_internal/unstable/sandbox/runtime_common.py Introduces upload entry model + tar-path normalization and validation helpers for sizes/modes/chunks.
src/vercel/_internal/unstable/sandbox/options.py Adds file_transfer_timeout to service options with a 5-minute default.
src/vercel/_internal/unstable/sandbox/filesystem_handle_common.py Adds shared handle helpers (option validation, read-size validation, text codecs).
src/vercel/_internal/unstable/sandbox/errors.py Adds transfer-focused filesystem error types (size mismatch, transfer base error).
src/vercel/_internal/unstable/sandbox/async_runtime.py Adds async fs.open, switches reads to streamed responses, and writes to streamed archive uploads.
src/vercel/_internal/unstable/sandbox/async_filesystem_handle.py Implements async streaming reader/writer handle classes and publish plumbing.
src/vercel/_internal/unstable/sandbox/api_client.py Adds streamed read responses, streamed upload bodies, and request timeouts for file transfers.
examples/unstable/sandbox_06_streaming_files.py Adds an end-to-end example demonstrating streaming upload and download.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +67 to +71
def _validate_read_size(size: int) -> None:
if not isinstance(size, int):
raise TypeError("size must be an integer")
if size < -1:
raise ValueError("size must be -1 or non-negative")

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Counterpoint: io.BytesIO(b"abc").read(True) == b"a". I'm going to allow bool for now unless maybe @elprans wants us to be more strict here too.

Comment thread src/vercel/_internal/unstable/sandbox/api_client.py
else:
assert self._send is not None
try:
await self._send.send(_EOF)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're missing a check on whether the write buffer was actually fully written (_written == _size`)?

@scotttrinh scotttrinh Jun 22, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elprans7444ded

Good catch, added that. While I was in there, it started to feel like the whole spooled vs direct split was getting a little messy, so I split those implementations up. Eventually, once we have a direct stream endpoint, we can gut the sized implementation down to the bare minimum, or just remove/deprecate it if we decide to do that.

Reduces the amount of branching and makes it easier to later remove or
reduce the functionality of the explicit-sized branch once we have a
direct streaming backend.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants