fix: stream ZIP downloads without buffering archives by mason5052 · Pull Request #339 · vxcontrol/pentagi

mason5052 · 2026-06-05T14:08:52Z

Summary

Directory and multi-path ZIP downloads built the entire archive in a bytes.Buffer before sending it, so the API process held the whole compressed archive on the heap before writing any response byte. This streams the archive straight to the HTTP response writer instead, keeping memory proportional to a single file's copy buffer rather than the full archive size.

Problem

DownloadResource (GET /resources/download) and DownloadFlowFile (GET /flows/{flowID}/files/download) both did, for every directory / multi-path request:

var buf bytes.Buffer
if err := resources.ZipResources(&buf, zipEntries); err != nil { ... }
c.DataFromReader(http.StatusOK, int64(buf.Len()), "application/zip", &buf, ...)

The full archive was materialized in buf purely so a Content-Length could be derived from buf.Len(). Combined with the 2 GB upload/pull limits, an authenticated User-role account can store (or pull) a large incompressible directory and request it as a ZIP, forcing an allocation roughly the size of the archive. A handful of concurrent requests can exhaust process/host memory and crash or severely degrade the API. The reporter's harness showed heap growth proportional to archive size through bytes.growSlice -> bytes.(*Buffer).Write -> archive/zip -> io.Copy -> resources.ZipResources.

Single-file downloads were already fine (they stream from disk); only the directory / multi-path ZIP paths were affected.

Solution

The underlying helpers (resources.ZipResources, flowfiles.ZipDirectory, flowfiles.ZipRelativePaths) already accept an io.Writer and stream each entry with io.Copy, so no archiving logic changed. A small shared helper writes the archive directly to c.Writer:

func streamZipArchive(c *gin.Context, filename string, build func(w io.Writer) error) error {
    c.Header("Content-Type", "application/zip")
    c.Header("Content-Disposition", mime.FormatMediaType("attachment", map[string]string{"filename": filename}))
    c.Status(http.StatusOK)
    if err := build(c.Writer); err != nil {
        c.Abort()
        return err
    }
    return nil
}

All four buffering call sites (two per handler) route through it. All pre-archive validation (auth, path sanitization, resource lookup, single-file 404) still runs before any byte is written and still returns normal error responses. Errors that occur after streaming has started (for example a blob that vanished from disk) are logged and the request is aborted, since the 200 status is already committed.

Filenames, archive entry names, ordering, permissions, and filtering are unchanged. The only observable differences: ZIP responses are now chunked (no Content-Length), and process memory no longer scales with archive size.

User Impact

Large directory / multi-path ZIP downloads no longer risk OOM-crashing the API; memory stays proportional to one file's copy buffer regardless of archive size.
ZIP responses use chunked transfer encoding (no Content-Length). Browsers still download normally via Content-Disposition: attachment; clients that relied on a ZIP Content-Length for a progress indicator will no longer receive one.
No API, schema, configuration, or single-file-download behavior change.

Test Plan

go build ./... (CGO_ENABLED=0) - passes.
go vet ./pkg/server/services/... - passes.
New TestStreamZipArchive (no DB dependency) - passes locally:
- streams a valid ZIP, asserts Content-Type: application/zip, asserts no Content-Length, and parses the body back into the expected entry;
- a build that fails after a partial flush propagates the error and aborts the request.
Extended TestResourceService_DownloadResourceScenarios and TestFlowFileService_DownloadFlowFileScenarios: every ZIP case now also asserts the response carries no Content-Length (locking in streaming), while keeping the existing valid-ZIP-content assertions.
The DB-backed scenario tests use a cgo SQLite driver and could not be executed in my Windows dev shell (no C compiler available; CGO_ENABLED=0). They compile cleanly and run in CI on Linux; locally they fail only at gorm.Open("sqlite3", ...), before the handler is invoked.

Closes #338

DownloadResource and DownloadFlowFile built the entire ZIP archive in a bytes.Buffer before sending it, so heap usage scaled with archive size and a few concurrent large-directory downloads could exhaust process memory. Stream the archive straight to the response writer via a shared streamZipArchive helper; the existing ZipResources/ZipDirectory/ZipRelativePaths helpers already accept an io.Writer. Responses are now chunked (no Content-Length) and memory stays proportional to one file's copy buffer.

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR updates ZIP download endpoints to stream archives directly to the HTTP response to avoid buffering entire ZIP files in memory.

Changes:

Replace in-memory ZIP buffering with streaming for resource and flow file downloads.
Add a shared streamZipArchive helper to centralize streaming ZIP response behavior.
Update tests to assert streamed ZIP responses do not include a Content-Length header and add direct helper coverage.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File	Description
backend/pkg/server/services/resources.go	Introduces `streamZipArchive` and switches resource downloads to stream ZIPs instead of buffering.
backend/pkg/server/services/flow_files.go	Switches flow file ZIP downloads from buffered responses to streaming using the shared helper.
backend/pkg/server/services/resources_test.go	Adds coverage for streaming behavior, including `Content-Length` expectations and helper tests.
backend/pkg/server/services/flow_files_test.go	Updates ZIP download scenario assertions for streamed ZIP behavior (no `Content-Length`).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Commit the ZIP response status and headers lazily, on the first byte written, via a small zipStreamWriter. A build failure before any output now returns the normal structured error response instead of a committed 200 with a truncated body; mid-stream failures still abort. Also check the reader Close error in the helper test and cover the pre-stream failure path.

Copilot AI review requested due to automatic review settings June 5, 2026 14:08

Copilot AI reviewed Jun 5, 2026

View reviewed changes

Comment thread backend/pkg/server/services/resources.go

Comment thread backend/pkg/server/services/resources.go

Comment thread backend/pkg/server/services/flow_files.go

Comment thread backend/pkg/server/services/flow_files.go

Comment thread backend/pkg/server/services/resources_test.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: stream ZIP downloads without buffering archives#339

fix: stream ZIP downloads without buffering archives#339
mason5052 wants to merge 2 commits into
vxcontrol:mainfrom
mason5052:codex/issue-338-stream-zip-downloads

mason5052 commented Jun 5, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mason5052 commented Jun 5, 2026

Summary

Problem

Solution

User Impact

Test Plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants