Skip to content

Stream backup archive directly to S3 to reduce ephemeral storage#4263

Draft
Copilot wants to merge 2 commits intomasterfrom
copilot/reduce-ephemeral-storage-backup-job
Draft

Stream backup archive directly to S3 to reduce ephemeral storage#4263
Copilot wants to merge 2 commits intomasterfrom
copilot/reduce-ephemeral-storage-backup-job

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 10, 2026

The daily backup job was consuming ~14 GB of node ephemeral storage because the tar.gz archive was written to local disk in full before being uploaded to S3 — making peak usage roughly raw_files + archive_size.

Changes

  • aws_backup.py: Add push_stream(dest) — launches aws s3 cp - s3://... as a subprocess and returns the Popen handle for callers to stream data via stdin.
  • combine_backup.py: Replace local tarfile.open(file, "x:gz") + aws.push(file) with tarfile.open(fileobj=upload_proc.stdin, mode="w:gz"), piping the archive directly to S3 as it is built. The archive is never materialized on disk.
# Before: write full archive to disk, then upload
with tarfile.open(backup_file, "x:gz") as tar:
    tar.add(...)
aws.push(backup_file, aws_file)

# After: stream archive directly to S3
upload_proc = aws.push_stream(aws_file)
with tarfile.open(fileobj=upload_proc.stdin, mode="w:gz") as tar:
    tar.add(...)

Peak ephemeral storage drops from roughly max(db_dump, backend_files) + archive_size to just max(db_dump, backend_files).


This change is Reviewable

Copilot AI changed the title [WIP] Reduce ephemeral storage used by backup job Stream backup archive directly to S3 to reduce ephemeral storage Apr 10, 2026
Copilot AI requested a review from imnasnainaec April 10, 2026 21:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reduce ephemeral storage used by backup job

2 participants