Stream backup archive directly to S3 to reduce ephemeral storage by Copilot · Pull Request #4263 · sillsdev/TheCombine

Copilot · 2026-04-10T21:34:12Z

The daily backup job was consuming ~14 GB of node ephemeral storage because the tar.gz archive was written to local disk in full before being uploaded to S3 — making peak usage roughly raw_files + archive_size.

Changes

aws_backup.py: Add push_stream(dest) — launches aws s3 cp - s3://... as a subprocess and returns the Popen handle for callers to stream data via stdin.
combine_backup.py: Replace local tarfile.open(file, "x:gz") + aws.push(file) with tarfile.open(fileobj=upload_proc.stdin, mode="w:gz"), piping the archive directly to S3 as it is built. The archive is never materialized on disk.

# Before: write full archive to disk, then upload
with tarfile.open(backup_file, "x:gz") as tar:
    tar.add(...)
aws.push(backup_file, aws_file)

# After: stream archive directly to S3
upload_proc = aws.push_stream(aws_file)
with tarfile.open(fileobj=upload_proc.stdin, mode="w:gz") as tar:
    tar.add(...)

Peak ephemeral storage drops from roughly max(db_dump, backend_files) + archive_size to just max(db_dump, backend_files).

This change is

Agent-Logs-Url: https://github.com/sillsdev/TheCombine/sessions/aa2742e4-8a85-48c9-82a2-3d3d7ee75063 Co-authored-by: imnasnainaec <6411521+imnasnainaec@users.noreply.github.com>

Initial plan

3d92132

Copilot AI assigned Copilot and imnasnainaec Apr 10, 2026

Copilot started work on behalf of imnasnainaec April 10, 2026 21:34 View session

Copilot AI linked an issue Apr 10, 2026 that may be closed by this pull request

Reduce ephemeral storage used by backup job #3944

Open

Stream backup tar.gz directly to S3 to reduce ephemeral storage

e751d63

Agent-Logs-Url: https://github.com/sillsdev/TheCombine/sessions/aa2742e4-8a85-48c9-82a2-3d3d7ee75063 Co-authored-by: imnasnainaec <6411521+imnasnainaec@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Reduce ephemeral storage used by backup job~~ Stream backup archive directly to S3 to reduce ephemeral storage Apr 10, 2026

Copilot AI requested a review from imnasnainaec April 10, 2026 21:41

Copilot finished work on behalf of imnasnainaec April 10, 2026 21:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Stream backup archive directly to S3 to reduce ephemeral storage#4263

Stream backup archive directly to S3 to reduce ephemeral storage#4263
Copilot wants to merge 2 commits intomasterfrom
copilot/reduce-ephemeral-storage-backup-job

Copilot AI commented Apr 10, 2026 •

edited by jasonleenaylor

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Copilot AI commented Apr 10, 2026 • edited by jasonleenaylor Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Apr 10, 2026 •

edited by jasonleenaylor

Loading