fix: stream ZIP downloads without buffering archives#339
Open
mason5052 wants to merge 2 commits into
Open
Conversation
DownloadResource and DownloadFlowFile built the entire ZIP archive in a bytes.Buffer before sending it, so heap usage scaled with archive size and a few concurrent large-directory downloads could exhaust process memory. Stream the archive straight to the response writer via a shared streamZipArchive helper; the existing ZipResources/ZipDirectory/ZipRelativePaths helpers already accept an io.Writer. Responses are now chunked (no Content-Length) and memory stays proportional to one file's copy buffer.
Contributor
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR updates ZIP download endpoints to stream archives directly to the HTTP response to avoid buffering entire ZIP files in memory.
Changes:
- Replace in-memory ZIP buffering with streaming for resource and flow file downloads.
- Add a shared
streamZipArchivehelper to centralize streaming ZIP response behavior. - Update tests to assert streamed ZIP responses do not include a
Content-Lengthheader and add direct helper coverage.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| backend/pkg/server/services/resources.go | Introduces streamZipArchive and switches resource downloads to stream ZIPs instead of buffering. |
| backend/pkg/server/services/flow_files.go | Switches flow file ZIP downloads from buffered responses to streaming using the shared helper. |
| backend/pkg/server/services/resources_test.go | Adds coverage for streaming behavior, including Content-Length expectations and helper tests. |
| backend/pkg/server/services/flow_files_test.go | Updates ZIP download scenario assertions for streamed ZIP behavior (no Content-Length). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Commit the ZIP response status and headers lazily, on the first byte written, via a small zipStreamWriter. A build failure before any output now returns the normal structured error response instead of a committed 200 with a truncated body; mid-stream failures still abort. Also check the reader Close error in the helper test and cover the pre-stream failure path.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Directory and multi-path ZIP downloads built the entire archive in a
bytes.Bufferbefore sending it, so the API process held the whole compressed archive on the heap before writing any response byte. This streams the archive straight to the HTTP response writer instead, keeping memory proportional to a single file's copy buffer rather than the full archive size.Problem
DownloadResource(GET /resources/download) andDownloadFlowFile(GET /flows/{flowID}/files/download) both did, for every directory / multi-path request:The full archive was materialized in
bufpurely so aContent-Lengthcould be derived frombuf.Len(). Combined with the 2 GB upload/pull limits, an authenticatedUser-role account can store (or pull) a large incompressible directory and request it as a ZIP, forcing an allocation roughly the size of the archive. A handful of concurrent requests can exhaust process/host memory and crash or severely degrade the API. The reporter's harness showed heap growth proportional to archive size throughbytes.growSlice -> bytes.(*Buffer).Write -> archive/zip -> io.Copy -> resources.ZipResources.Single-file downloads were already fine (they stream from disk); only the directory / multi-path ZIP paths were affected.
Solution
The underlying helpers (
resources.ZipResources,flowfiles.ZipDirectory,flowfiles.ZipRelativePaths) already accept anio.Writerand stream each entry withio.Copy, so no archiving logic changed. A small shared helper writes the archive directly toc.Writer:All four buffering call sites (two per handler) route through it. All pre-archive validation (auth, path sanitization, resource lookup, single-file 404) still runs before any byte is written and still returns normal error responses. Errors that occur after streaming has started (for example a blob that vanished from disk) are logged and the request is aborted, since the 200 status is already committed.
Filenames, archive entry names, ordering, permissions, and filtering are unchanged. The only observable differences: ZIP responses are now chunked (no
Content-Length), and process memory no longer scales with archive size.User Impact
Content-Length). Browsers still download normally viaContent-Disposition: attachment; clients that relied on a ZIPContent-Lengthfor a progress indicator will no longer receive one.Test Plan
go build ./...(CGO_ENABLED=0) - passes.go vet ./pkg/server/services/...- passes.TestStreamZipArchive(no DB dependency) - passes locally:Content-Type: application/zip, asserts noContent-Length, and parses the body back into the expected entry;buildthat fails after a partial flush propagates the error and aborts the request.TestResourceService_DownloadResourceScenariosandTestFlowFileService_DownloadFlowFileScenarios: every ZIP case now also asserts the response carries noContent-Length(locking in streaming), while keeping the existing valid-ZIP-content assertions.CGO_ENABLED=0). They compile cleanly and run in CI on Linux; locally they fail only atgorm.Open("sqlite3", ...), before the handler is invoked.Closes #338