Skip to content

fix: stream-based export for large databases (SQL, CSV, JSON)#255

Open
chaudl113 wants to merge 4 commits into
outerbase:mainfrom
chaudl113:fix/streaming-export-large-databases
Open

fix: stream-based export for large databases (SQL, CSV, JSON)#255
chaudl113 wants to merge 4 commits into
outerbase:mainfrom
chaudl113:fix/streaming-export-large-databases

Conversation

@chaudl113
Copy link
Copy Markdown

Fixes #59

/claim #59

Summary

Replace in-memory export with streaming using TransformStream and chunked LIMIT/OFFSET queries. This prevents the 30-second timeout on large databases by processing data in manageable batches instead of loading everything into memory.

Changes

src/export/index.ts

  • Added getTableDataChunked() — async generator that fetches rows in configurable chunks (default 1000) using LIMIT/OFFSET
  • Added createStreamingExportResponse() — creates a Response backed by a TransformStream, allowing the producer to write data incrementally
  • Added writeChunk() helper for encoding and writing string data to the stream

src/export/dump.ts

  • Rewrote dumpDatabaseRoute() to stream SQL dump output
  • Schema fetched per-table via parameterized query (also fixes SQL injection in original)
  • Data rows written in 1000-row batches with breathing intervals (10ms) between chunks to avoid DO lock contention

src/export/csv.ts

  • Rewrote to use chunked streaming for all table sizes
  • CSV headers written from first chunk, then rows streamed incrementally

src/export/json.ts

  • Rewrote to stream JSON array output
  • Proper comma handling between rows (first row vs subsequent)

How it works

Before (breaks on large DB):

SELECT * FROM table → load ALL rows into memory → build string → return

After (works at any scale):

for each chunk of 1000 rows:
  SELECT * FROM table LIMIT 1000 OFFSET N
  write chunk to stream
  breathe (10ms) → let other DO requests through

Testing

  • All 23 export tests pass (dump, csv, json, index)
  • Added new test for chunked streaming with 2500 rows across 3 chunks
  • Existing behavior preserved for small tables

Replace in-memory export with streaming using TransformStream and
chunked LIMIT/OFFSET queries. Fixes timeout on large databases.

- dump: stream SQL rows in 1000-row batches via async generator
- csv: stream CSV rows with header detection from first chunk
- json: stream JSON array with proper comma handling
- breathing intervals between chunks to avoid DO lock contention
- all 23 export tests pass

Signed-off-by: longtn <tnlong1214@gmail.com>
@chaudl113
Copy link
Copy Markdown
Author

Demo

Before (current implementation)

SELECT * FROM users  →  loads ALL 10M rows into memory
Build entire SQL dump string in memory
Return as single Blob  →  💥 timeout at 30s for large DBs

After (streaming implementation)

for each chunk of 1000 rows:
  SELECT * FROM users LIMIT 1000 OFFSET N
  Write INSERT statements directly to response stream
  Breathe 10ms  →  let other DO requests through
  Repeat until all rows exported

Key changes:

  • TransformStream — data streams to client as it's fetched
  • LIMIT/OFFSET chunking — constant memory usage regardless of DB size
  • Breathing intervals — prevents DO lock contention
  • Parameterized queries — fixes SQL injection in original code

Test results:

✓ src/export/index.test.ts  — 8 tests passed
✓ src/export/csv.test.ts    — 5 tests passed  
✓ src/export/json.test.ts   — 5 tests passed
✓ src/export/dump.test.ts   — 5 tests passed (incl. 2500-row chunked test)

All 23 tests pass. Video demo available upon request.

@chaudl113
Copy link
Copy Markdown
Author

Demo Video

Streaming Export Demo

What the demo shows:

  • The problem: current export loads entire DB into memory, fails on large databases
  • The solution: streaming with TransformStream + chunked LIMIT/OFFSET queries
  • Breathing intervals between chunks to avoid DO lock contention
  • All 23 export tests passing

Technical details:

  • Chunk size: 1000 rows per batch
  • Breathing interval: 10ms between chunks
  • Memory usage: constant regardless of database size
  • Works for SQL, CSV, and JSON exports

chaudl113 added 3 commits May 30, 2026 14:06
Signed-off-by: longtn <tnlong1214@gmail.com>
Signed-off-by: longtn <tnlong1214@gmail.com>
Signed-off-by: longtn <tnlong1214@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Database dumps do not work on large databases

1 participant