Skip to content

fix: implement overflow pages in sqlite_writer to prevent SIGBUS on large records (#139)#175

Open
dLo999 wants to merge 1 commit intoDeusData:mainfrom
dLo999:fix/sqlite-writer-overflow-pages
Open

fix: implement overflow pages in sqlite_writer to prevent SIGBUS on large records (#139)#175
dLo999 wants to merge 1 commit intoDeusData:mainfrom
dLo999:fix/sqlite-writer-overflow-pages

Conversation

@dLo999
Copy link
Copy Markdown

@dLo999 dLo999 commented Mar 29, 2026

Closes #139 (reopened v0.5.7 crash)

Summary

Implements SQLite overflow page support for table leaf cells in the custom B-tree writer (sqlite_writer.c). This fixes the SIGBUS crash when indexing large repositories that produce records exceeding 65KB (e.g., nodes with very long properties JSON or qualified_name).

Root cause

pb_add_table_cell_with_flush called pb_add_cell unconditionally after flushing the current page. If a single record's payload exceeded the usable page space (~65,501 bytes), content_offset -= cell_len underflowed to negative, causing memcpy to write before the page buffer — stack corruption leading to SIGBUS.

The reporter's crash profile matches: Java monorepo with 13,660 files producing 226K nodes + 477K edges, SIGBUS immediately after pass.timing pass=configlink during the gbuf.dump / SQLite write step.

Fix

Implemented SQLite's overflow page mechanism for table leaf cells:

  1. write_overflow_pages() — writes a linked-list of overflow pages for payload bytes that exceed local storage. Each page has a 4-byte next-page pointer followed by payload data.

  2. build_table_cell_overflow() — builds a table leaf cell in the overflow format: varint(total_payload_len) + varint(rowid) + payload[0..local_len) + uint32(overflow_page_num). The total payload length varint still encodes the full original size so SQLite knows how much to read.

  3. pb_add_table_cell_with_flush() — now detects payload_len > TABLE_OVERFLOW_MAX_LOCAL (65,501), computes local_len using SQLite's exact integer formula, writes overflow pages, then builds the reduced cell. Normal path (no overflow) is unchanged.

Key constants (PAGE_SIZE=65536, reserved=0)

Constant Value Formula
TABLE_OVERFLOW_MAX_LOCAL 65,501 usable_size - 35
TABLE_OVERFLOW_MIN_LOCAL 8,199 (usable_size - 12) * 32 / 255 - 23
Overflow page data capacity 65,532 usable_size - 4 (4-byte next pointer)

Test results

Build: Compiles cleanly on macOS (Apple Clang, arm64)

Test suite: 2742 passed, 0 failed (was 2741 — new test added)

New test: sw_oversized_node

  • Creates a node with 70,000-character properties JSON (exceeds 65,501-byte max_local threshold)
  • Writes DB via cbm_write_db
  • Validates with PRAGMA integrity_check (verifies the overflow page chain is correctly formed)
  • Reads the node back via SELECT and verifies correct data

This unit test directly exercises the exact code path that causes the SIGBUS — a payload > 65,501 bytes hitting pb_add_table_cell_with_flush → overflow pages → pb_add_cell with the reduced local cell.

End-to-end indexing:

Repo Nodes Edges Result
codebase-memory-mcp itself (768 files) 23,736 49,432 PASS — no crash
Synthetic Java repo (1.4MB file, 80-param methods) 8 3 PASS — no crash (tree-sitter truncated long signatures during extraction)

What we could not test:

  • The exact reporter's scenario (13,660-file Java monorepo, 226K nodes). We don't have access to a comparable repo. However, the unit test covers the exact failure mode: payload > 65KB → overflow pages → PRAGMA integrity_check validates the page chain. The extraction layer may truncate signatures before they reach the writer, but the writer itself is now safe for any payload size.

Generated with agent-team via /issue

Copy link
Copy Markdown
Author

@dLo999 dLo999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

Implements SQLite overflow page support in the custom B-tree writer to fix SIGBUS crash when records exceed 65KB. Adds write_overflow_pages(), build_table_cell_overflow(), and modifies pb_add_table_cell_with_flush() to detect and handle oversized payloads. New test validates with PRAGMA integrity_check.

Findings

  • [nit] internal/cbm/sqlite_writer.c:743-759 — fseek/fwrite errors silently ignored via NOLINTNEXTLINE(cert-err33-c). This is the pre-existing pattern throughout the entire file (see lines 394, 396, 567, 569, 812, 814, 915, 917, 948, 950, 1835, 1837). All page writes use "best-effort write, errors detected at fclose". The new overflow code correctly follows this convention. Changing error handling only in the overflow path would be inconsistent with the rest of the writer.
  • [warning] tests/test_sqlite_writer.c — Test covers a 70KB payload (happy path). Boundary tests at 65,501 and 65,502 bytes would strengthen coverage. Worth adding but not blocking — the overflow threshold logic uses exact SQLite constants verified against PRAGMA integrity_check.
  • [nit] internal/cbm/sqlite_writer.c:719 — Comment could document the "best-effort I/O" pattern for clarity. Minor.

CI Status

No CI checks on fork branch. Local: 2742 tests pass, 0 regressions.

Verdict

APPROVE — The overflow implementation correctly follows SQLite's spec (overflow page chains, local_len computation, total_payload_len varint encoding). The "critical" fseek/fwrite finding is a pre-existing pattern used by all 14 other page writes in the file — not a new risk introduced by this PR. The new test validates the complete chain via PRAGMA integrity_check. Boundary tests would be nice-to-have but the core fix is correct and safe to merge.

@DeusData DeusData added bug Something isn't working stability/performance Server crashes, OOM, hangs, high CPU/memory labels Mar 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working stability/performance Server crashes, OOM, hangs, high CPU/memory

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Stack overflow crash in autoindex_thread → cbm_pipeline_pass_configlink (v0.5.6, macOS ARM64)

2 participants