Skip to content

fix(core): handle SQLite and Windows semantic regressions#655

Merged
phernandez merged 4 commits intomainfrom
codex/fix-652-653-654
Mar 10, 2026
Merged

fix(core): handle SQLite and Windows semantic regressions#655
phernandez merged 4 commits intomainfrom
codex/fix-652-653-654

Conversation

@phernandez
Copy link
Member

Summary

  • handle missing sqlite-vec gracefully in bm project info instead of crashing on vec0 queries
  • use per-process log files on Windows to avoid log rotation contention between CLI and MCP processes
  • collapse duplicate vector chunk keys before embedding writes so bm reindex --embeddings cannot enqueue duplicate rowids

Testing

  • uv run ruff check src/basic_memory/services/project_service.py src/basic_memory/utils.py src/basic_memory/repository/search_repository_base.py tests/services/test_project_service_embedding_status.py tests/repository/test_semantic_search_base.py tests/utils/test_setup_logging.py
  • uv run pytest tests/services/test_project_service_embedding_status.py tests/repository/test_semantic_search_base.py tests/utils/test_setup_logging.py
  • uv run pytest tests/repository/test_sqlite_vector_search_repository.py

Closes #652
Closes #653
Closes #654

Handle missing sqlite-vec in project info, avoid Windows log rotation contention, and dedupe duplicate chunk keys during embedding sync.

Refs #652
Refs #653
Refs #654

Signed-off-by: phernandez <paul@basicmachines.co>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 9, 2026

Claude finished @phernandez's task in 2m 15s —— View job


PR Review: Fix SQLite and Windows semantic regressions

  • Gather context and read changed files
  • Review code quality & standards
  • Review testing & documentation
  • Review architecture & security
  • Post final review summary

Summary

This PR addresses three distinct regressions: graceful degradation when sqlite-vec is absent, per-process log files on Windows, and deduplication of vector chunk keys before embedding writes. The overall approach is solid and well-motivated. A few items worth addressing:


Code Quality & Standards

utils.py# pragma: no cover vs. new tests

Issue: setup_logging is marked # pragma: no cover at the function declaration (line 253), but the PR adds two tests that explicitly call it. This means coverage for the new tests is never counted, and the 100% coverage requirement is satisfied only by suppression, not by actual testing.

# utils.py:253
) -> None:  # pragma: no cover

If the tests work (and they look correct), remove # pragma: no cover so the coverage tool actually measures the new tests' contribution. Fix this →

search_repository_base.py — last-write-wins on duplicate chunks

In _build_chunk_records, when a duplicate chunk_key is detected, the new record overwrites the existing one. This is fine when duplicates are identical (the described SQLite FTS5 case), but the comment says "deterministic" — that's only true if the chunk_text is also the same. If duplicates with different text can occur, the outcome depends on row order. Worth adding a note or a content-equality assertion in the warning log for diagnostics.

# search_repository_base.py ~line 465
records_by_key[chunk_key] = {  # last-write-wins if content differs

project_service.py — broad try-except scope

The try block (lines ~1008–1053) wraps all four SQL queries including search_vector_chunks counts that have no vec0 dependency. If the chunks table itself errors (e.g., schema migration issue), the string-match guard "no such module: vec0" will re-raise — which is good. But the scope is wider than necessary and could mask other OperationalErrors if the message ever changes.

A tighter structure would wrap only the embedding join queries in the try, leaving the chunks-table queries outside. Not a blocker, but worth considering for future resilience.

project_service.py — Windows log file accumulation

With per-process log filenames (basic-memory-{PID}.log), loguru's retention=5 parameter applies to that specific file, not to the pattern basic-memory-*.log. This means each unique PID leaves its own log file and the retention doesn't clean them up. On Windows machines with many CLI/MCP invocations, ~/.basic-memory/ will accumulate unbounded log files. Consider using a glob pattern for retention or adding a startup cleanup. Not a crash bug, but a UX regression on long-running Windows installs.


Testing & Documentation

Check Status
Unit tests for new functions ✅ Good coverage of _build_chunk_records dedup, setup_logging behavior
Duplicate chunk key collapse test test_duplicate_rows_collapse_to_unique_chunk_keys is clear and targeted
sqlite-vec graceful degradation test ✅ Mocks the OperationalError correctly; tests the exact returned EmbeddingStatus fields
Windows log file per-process test ✅ Uses monkeypatching of os.name and os.getpid cleanly
100% coverage maintained ⚠️ setup_logging is still # pragma: no cover — new tests won't count
Edge case: duplicate rows with different text ⚠️ Not tested

Basic Memory Architecture

Check Status
Preserves local-first architecture
Knowledge graph consistency
MCP tool atomicity ✅ (not changed)
No DB migrations needed ✅ (no schema changes)
Markdown file integrity

Security & Performance

Check Status
No hardcoded secrets
Error message exposes only safe details ✅ (str(exc).lower() string check is safe)
Exception is re-raised for Postgres ✅ (if is_postgres or ... : raise)
No emojis/unicode in log output
Windows path comparison compatible

Checklist Results

Item Status
Coding conventions (CLAUDE.md)
Python 3.12+ types & async
SQLAlchemy 2.0 patterns
100-char line limit
No commented-out code blocks
Decision-point comments (literate style) ✅ Excellent — all three fixes have proper Trigger/Why/Outcome comments
100% test coverage ⚠️ setup_logging # pragma: no cover should be removed
Windows compatibility
No emoji/unicode in console/log output

Overall Assessment

Approve with minor suggestions. The three fixes are well-motivated and correctly implemented. The main nit is removing # pragma: no cover from setup_logging now that it has real tests. The Windows log accumulation is worth a follow-up issue. The broad try-except scope is acceptable given the string-match guard.

@phernandez phernandez changed the title Fix SQLite and Windows semantic regressions fix(core): handle SQLite and Windows semantic regressions Mar 9, 2026
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 975504410b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

# Trigger: Windows does not allow renaming an open file held by another process.
# Why: multiple basic-memory processes can share the same log directory at once.
# Outcome: use per-process log files on Windows so log rotation stays local.
log_filename = f"basic-memory-{os.getpid()}.log" if os.name == "nt" else "basic-memory.log"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid unbounded PID-specific log file growth on Windows

Using basic-memory-{pid}.log makes every process write to a brand-new filename, so Loguru's retention=5 no longer limits the overall log directory size across runs. On Windows hosts where CLI/MCP processes start frequently, ~/.basic-memory will accumulate stale per-PID logs indefinitely and can eventually consume significant disk space.

Useful? React with 👍 / 👎.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 445d009 by trimming stale per-process Windows log files during setup while keeping the current-process file path unchanged.

Bound the per-process Windows log directory so the PID-based rotation workaround does not accumulate stale files indefinitely.

Signed-off-by: phernandez <paul@basicmachines.co>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 9, 2026

Claude finished @phernandez's task in 2m 22s —— View job


PR Review: Fix SQLite and Windows semantic regressions

  • Gather context and read changed files
  • Review code quality & standards
  • Review testing & documentation
  • Review architecture & security
  • Post final review summary

Summary

This PR closes three regressions with well-scoped, well-commented fixes. The latest commit (445d009) correctly addresses the Windows log accumulation issue raised in the previous review. One outstanding item from the prior review remains unaddressed.


Code Quality & Standards

utils.py# pragma: no cover on setup_logging (still outstanding)

The function signature still carries # pragma: no cover at line 254, which suppresses coverage measurement for the entire function body — even though three new tests now call setup_logging directly and cover the Windows/non-Windows branching logic.

# utils.py:254
) -> None:  # pragma: no cover   ← should be removed

The untested branches (log_to_stdout, structured_context, env == "test") could each be covered with a simple monkeypatched test, or marked individually with # pragma: no cover at the line level per CLAUDE.md guidelines. As written, the three new tests provide real value but their contribution goes unmeasured. Fix this →

utils.py_cleanup_windows_log_files f-string inconsistency (nit)

# line 337 — f-string style
logger.debug(f"Failed to delete stale Windows log file: {stale_log}")

# vs. rest of the file — lazy format style (consistent with loguru best practice)
logger.warning(
    "Collapsed duplicate vector chunk keys before embedding sync: ...",
    project_id=self.project_id,
    ...
)

Minor inconsistency only; not a functional issue.

project_service.py — try-except scope (acknowledged from prior review)

The try block (lines 1008–1061) still wraps all four SQL queries including the plain search_vector_chunks count queries that have no vec0 dependency. The string-match guard "no such module: vec0" correctly re-raises unrelated errors, so this remains acceptable. Worth noting as a future cleanup candidate.


Testing & Documentation

Check Status
Unit tests for new functions/methods ✅ Good coverage of _build_chunk_records dedup, setup_logging Windows behavior, and _cleanup_windows_log_files retention
Duplicate chunk key collapse test test_duplicate_rows_collapse_to_unique_chunk_keys — clear and targeted
sqlite-vec graceful degradation test ✅ Mocks OperationalError correctly; verifies the exact returned EmbeddingStatus fields
Windows log per-process file test test_setup_logging_uses_per_process_log_file_on_windows — clean monkeypatching of os.name and os.getpid
Windows log retention test test_setup_logging_trims_stale_windows_pid_logs — verifies the 4-file retention ceiling correctly (6 stale → 4 kept, 2 removed)
Windows log accumulation (prior review) ✅ Fixed in 445d009 via _cleanup_windows_log_files
100% coverage maintained ⚠️ setup_logging # pragma: no cover — new tests won't count toward coverage
Edge case: duplicate rows with different text ⚠️ Not tested (last-write-wins behavior is undocumented in the warning log)

Basic Memory Architecture

Check Status
Preserves local-first architecture
Knowledge graph consistency
MCP tool atomicity ✅ (not changed)
No DB migrations needed ✅ (no schema changes)
Markdown file integrity
Decision-point comments (literate style) ✅ All three fixes have proper Trigger/Why/Outcome comments

Security & Performance

Check Status
No hardcoded secrets
Error message exposes only safe details ✅ (str(exc).lower() string check is safe)
Exception re-raised for Postgres ✅ (if is_postgres or ... : raise)
No emoji/unicode in log output
Windows path comparison compatible
Stale PID log OSError silenced safely ✅ Another process may hold the file; logger.debug is appropriate

Checklist Results

Item Status
Coding conventions (CLAUDE.md)
Python 3.12+ types & async
SQLAlchemy 2.0 patterns
100-char line length
No commented-out code blocks
Decision-point comments (literate style)
100% test coverage ⚠️ setup_logging # pragma: no cover should be removed now that tests exist
Windows compatibility ✅ (improved by this PR)
No emoji/unicode in console/log output

Overall Assessment

Approve with minor suggestion. The three regression fixes are correct and well-tested. The Windows log accumulation concern raised in the prior review is now resolved. The one remaining item is removing # pragma: no cover from setup_logging now that three real tests exercise it — this would require either adding tests for the remaining untested branches (log_to_stdout, structured_context, env == "test") or moving the pragma to those specific lines.

Signed-off-by: phernandez <paul@basicmachines.co>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 9, 2026

Claude finished @phernandez's task in 2m 40s —— View job


PR Review: Fix SQLite and Windows semantic regressions

  • Gather context and read changed files
  • Review code quality & standards
  • Review testing & documentation
  • Review architecture & security
  • Post final review summary

Summary

Three well-scoped fixes — graceful sqlite-vec degradation, per-process Windows log files, and duplicate chunk key deduplication — are all correctly implemented. Both prior reviews flagged one outstanding item: removing # pragma: no cover from setup_logging now that tests exist. I've prepared a fix (see below). Push permissions are restricted in this CI environment, so this will need to be applied manually.


Prepared Fix: Remove # pragma: no cover from setup_logging

src/basic_memory/utils.py line 254 — change:

# before
) -> None:  # pragma: no cover

# after  
) -> None:

tests/utils/test_setup_logging.py — add four new tests covering the remaining branches (env == "test" early return, log_to_stdout, structured_context, and the httpx/watchfiles suppression lines):

def test_setup_logging_test_env_uses_stderr_only(monkeypatch) -> None:
    """In test mode, setup_logging adds a single stderr sink and returns early."""
    added_sinks: list[str] = []

    monkeypatch.setenv("BASIC_MEMORY_ENV", "test")
    monkeypatch.setattr(utils.logger, "remove", lambda *args, **kwargs: None)
    monkeypatch.setattr(
        utils.logger,
        "add",
        lambda sink, **kwargs: added_sinks.append(sink.__name__ if hasattr(sink, "__name__") else str(sink)),
    )

    utils.setup_logging(log_to_file=True, log_to_stdout=True)

    assert len(added_sinks) == 1


def test_setup_logging_log_to_stdout(monkeypatch) -> None:
    """log_to_stdout=True should add a stderr sink on non-Windows outside test mode."""
    added_sinks: list[str] = []

    monkeypatch.setenv("BASIC_MEMORY_ENV", "dev")
    monkeypatch.setattr(utils.os, "name", "posix")
    monkeypatch.setattr(utils.logger, "remove", lambda *args, **kwargs: None)
    monkeypatch.setattr(
        utils.logger,
        "add",
        lambda sink, **kwargs: added_sinks.append(sink.__name__ if hasattr(sink, "__name__") else str(sink)),
    )

    utils.setup_logging(log_to_stdout=True)

    assert len(added_sinks) == 1


def test_setup_logging_structured_context(monkeypatch) -> None:
    """structured_context=True should call logger.configure with cloud env vars."""
    configured_extras: list[dict] = []

    monkeypatch.setenv("BASIC_MEMORY_ENV", "dev")
    monkeypatch.setattr(utils.logger, "remove", lambda *args, **kwargs: None)
    monkeypatch.setattr(utils.logger, "add", lambda *args, **kwargs: None)
    monkeypatch.setattr(
        utils.logger,
        "configure",
        lambda **kwargs: configured_extras.append(kwargs.get("extra", {})),
    )

    utils.setup_logging(structured_context=True)

    assert len(configured_extras) == 1
    assert "tenant_id" in configured_extras[0]
    assert "fly_region" in configured_extras[0]


def test_setup_logging_suppresses_noisy_loggers(monkeypatch) -> None:
    """setup_logging should raise the log level for httpx and watchfiles."""
    import logging
    monkeypatch.setenv("BASIC_MEMORY_ENV", "dev")
    monkeypatch.setattr(utils.logger, "remove", lambda *args, **kwargs: None)
    monkeypatch.setattr(utils.logger, "add", lambda *args, **kwargs: None)

    utils.setup_logging()

    assert logging.getLogger("httpx").level == logging.WARNING
    assert logging.getLogger("watchfiles.main").level == logging.WARNING

Checklist Results

Item Status
Coding conventions (CLAUDE.md)
Python 3.12+ types & async
SQLAlchemy 2.0 patterns
100-char line length
No commented-out code blocks
Decision-point comments (literate style) ✅ Excellent Trigger/Why/Outcome comments throughout
100% test coverage ⚠️ Requires removing # pragma: no cover + 4 new tests (fix prepared above)
Windows compatibility ✅ (improved by this PR)
_cleanup_windows_log_files f-string nit ⚠️ Minor — logger.debug(f"Failed to delete...") should use lazy format: logger.debug("Failed to delete stale Windows log file: {path}", path=stale_log)
sqlite-vec graceful degradation
Duplicate chunk key dedup
No hardcoded secrets
Windows path comparisons compatible

Overall Assessment

Approve with minor follow-up. All three regression fixes are correct and well-tested. The only items remaining from prior reviews are:

  1. # pragma: no cover on setup_logging (fix prepared above — apply manually)
  2. _cleanup_windows_log_files f-string nit — change logger.debug(f"Failed to delete stale Windows log file: {stale_log}") to logger.debug("Failed to delete stale Windows log file: {path}", path=stale_log) to match loguru's lazy-format style used elsewhere in the file

Signed-off-by: phernandez <paul@basicmachines.co>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 9, 2026

Claude finished @phernandez's task in 1m 26s —— View job


PR Review: Fix SQLite and Windows semantic regressions

  • Gather context and read changed files
  • Review code quality & standards
  • Review testing & documentation
  • Review architecture & security
  • Post final review summary

Summary

All three regression fixes are correctly implemented and all outstanding items from prior reviews have been addressed in the latest commit (9ef040e test: cover logging setup branches). This PR is ready to merge.


Resolved Items (vs. prior reviews)

Item Status
# pragma: no cover removed from setup_logging ✅ Removed in 9ef040e
Tests added for env == "test" early-return branch test_setup_logging_test_env_uses_stderr_only
Tests added for log_to_stdout=True branch test_setup_logging_log_to_stdout
Tests added for structured_context=True branch test_setup_logging_structured_context
Tests added for third-party logger suppression test_setup_logging_suppresses_noisy_loggers
_cleanup_windows_log_files f-string → loguru lazy format logger.debug("... {path}", path=stale_log) at line 337

Final Checklist

Item Status
Coding conventions (CLAUDE.md)
Python 3.12+ types & async
SQLAlchemy 2.0 patterns
100-char line length
No commented-out code blocks
Decision-point comments (literate style) ✅ Excellent Trigger/Why/Outcome comments throughout
100% test coverage # pragma: no cover removed; 7 tests cover all setup_logging branches
Windows compatibility ✅ Per-process log files + bounded cleanup
sqlite-vec graceful degradation OperationalError caught with "no such module: vec0" guard
Duplicate chunk key deduplication ✅ Last-write-wins dedup before embedding writes
No hardcoded secrets
Windows path comparisons compatible
No emoji/unicode in console/log output

Overall Assessment

Approve. All three regression fixes are correct and well-tested. All items raised in the two prior reviews have been resolved. The setup_logging function now has real coverage via 7 targeted tests, the f-string nit has been fixed with loguru lazy format, and the Windows log accumulation issue is addressed by _cleanup_windows_log_files.

@phernandez phernandez merged commit 30a8935 into main Mar 10, 2026
25 checks passed
@phernandez phernandez deleted the codex/fix-652-653-654 branch March 10, 2026 03:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant