fix(core): handle SQLite and Windows semantic regressions by phernandez · Pull Request #655 · basicmachines-co/basic-memory

phernandez · 2026-03-09T19:32:42Z

Summary

handle missing sqlite-vec gracefully in bm project info instead of crashing on vec0 queries
use per-process log files on Windows to avoid log rotation contention between CLI and MCP processes
collapse duplicate vector chunk keys before embedding writes so bm reindex --embeddings cannot enqueue duplicate rowids

Testing

uv run ruff check src/basic_memory/services/project_service.py src/basic_memory/utils.py src/basic_memory/repository/search_repository_base.py tests/services/test_project_service_embedding_status.py tests/repository/test_semantic_search_base.py tests/utils/test_setup_logging.py
uv run pytest tests/services/test_project_service_embedding_status.py tests/repository/test_semantic_search_base.py tests/utils/test_setup_logging.py
uv run pytest tests/repository/test_sqlite_vector_search_repository.py

Closes #652
Closes #653
Closes #654

Handle missing sqlite-vec in project info, avoid Windows log rotation contention, and dedupe duplicate chunk keys during embedding sync. Refs #652 Refs #653 Refs #654 Signed-off-by: phernandez <paul@basicmachines.co>

github-actions · 2026-03-09T19:32:56Z

Claude finished @phernandez's task in 2m 15s —— View job

PR Review: Fix SQLite and Windows semantic regressions

Gather context and read changed files
Review code quality & standards
Review testing & documentation
Review architecture & security
Post final review summary

Summary

This PR addresses three distinct regressions: graceful degradation when sqlite-vec is absent, per-process log files on Windows, and deduplication of vector chunk keys before embedding writes. The overall approach is solid and well-motivated. A few items worth addressing:

Code Quality & Standards

`utils.py` — `# pragma: no cover` vs. new tests

Issue: setup_logging is marked # pragma: no cover at the function declaration (line 253), but the PR adds two tests that explicitly call it. This means coverage for the new tests is never counted, and the 100% coverage requirement is satisfied only by suppression, not by actual testing.

# utils.py:253
) -> None:  # pragma: no cover

If the tests work (and they look correct), remove # pragma: no cover so the coverage tool actually measures the new tests' contribution. Fix this →

`search_repository_base.py` — last-write-wins on duplicate chunks

In _build_chunk_records, when a duplicate chunk_key is detected, the new record overwrites the existing one. This is fine when duplicates are identical (the described SQLite FTS5 case), but the comment says "deterministic" — that's only true if the chunk_text is also the same. If duplicates with different text can occur, the outcome depends on row order. Worth adding a note or a content-equality assertion in the warning log for diagnostics.

# search_repository_base.py ~line 465
records_by_key[chunk_key] = {  # last-write-wins if content differs

`project_service.py` — broad try-except scope

The try block (lines ~1008–1053) wraps all four SQL queries including search_vector_chunks counts that have no vec0 dependency. If the chunks table itself errors (e.g., schema migration issue), the string-match guard "no such module: vec0" will re-raise — which is good. But the scope is wider than necessary and could mask other OperationalErrors if the message ever changes.

A tighter structure would wrap only the embedding join queries in the try, leaving the chunks-table queries outside. Not a blocker, but worth considering for future resilience.

`project_service.py` — Windows log file accumulation

With per-process log filenames (basic-memory-{PID}.log), loguru's retention=5 parameter applies to that specific file, not to the pattern basic-memory-*.log. This means each unique PID leaves its own log file and the retention doesn't clean them up. On Windows machines with many CLI/MCP invocations, ~/.basic-memory/ will accumulate unbounded log files. Consider using a glob pattern for retention or adding a startup cleanup. Not a crash bug, but a UX regression on long-running Windows installs.

Testing & Documentation

Check	Status
Unit tests for new functions	✅ Good coverage of `_build_chunk_records` dedup, `setup_logging` behavior
Duplicate chunk key collapse test	✅ `test_duplicate_rows_collapse_to_unique_chunk_keys` is clear and targeted
`sqlite-vec` graceful degradation test	✅ Mocks the OperationalError correctly; tests the exact returned `EmbeddingStatus` fields
Windows log file per-process test	✅ Uses monkeypatching of `os.name` and `os.getpid` cleanly
100% coverage maintained	⚠️ `setup_logging` is still `# pragma: no cover` — new tests won't count
Edge case: duplicate rows with different text	⚠️ Not tested

Basic Memory Architecture

Check	Status
Preserves local-first architecture	✅
Knowledge graph consistency	✅
MCP tool atomicity	✅ (not changed)
No DB migrations needed	✅ (no schema changes)
Markdown file integrity	✅

Security & Performance

Check	Status
No hardcoded secrets	✅
Error message exposes only safe details	✅ (`str(exc).lower()` string check is safe)
Exception is re-raised for Postgres	✅ (`if is_postgres or ... : raise`)
No emojis/unicode in log output	✅
Windows path comparison compatible	✅

Checklist Results

Item	Status
Coding conventions (CLAUDE.md)	✅
Python 3.12+ types & async	✅
SQLAlchemy 2.0 patterns	✅
100-char line limit	✅
No commented-out code blocks	✅
Decision-point comments (literate style)	✅ Excellent — all three fixes have proper Trigger/Why/Outcome comments
100% test coverage	⚠️ `setup_logging` `# pragma: no cover` should be removed
Windows compatibility	✅
No emoji/unicode in console/log output	✅

Overall Assessment

Approve with minor suggestions. The three fixes are well-motivated and correctly implemented. The main nit is removing # pragma: no cover from setup_logging now that it has real tests. The Windows log accumulation is worth a follow-up issue. The broad try-except scope is acceptable given the string-match guard.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 975504410b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-09T19:36:33Z

src/basic_memory/utils.py

+        # Trigger: Windows does not allow renaming an open file held by another process.
+        # Why: multiple basic-memory processes can share the same log directory at once.
+        # Outcome: use per-process log files on Windows so log rotation stays local.
+        log_filename = f"basic-memory-{os.getpid()}.log" if os.name == "nt" else "basic-memory.log"


Avoid unbounded PID-specific log file growth on Windows

Using basic-memory-{pid}.log makes every process write to a brand-new filename, so Loguru's retention=5 no longer limits the overall log directory size across runs. On Windows hosts where CLI/MCP processes start frequently, ~/.basic-memory will accumulate stale per-PID logs indefinitely and can eventually consume significant disk space.

Useful? React with 👍 / 👎.

Addressed in 445d009 by trimming stale per-process Windows log files during setup while keeping the current-process file path unchanged.

Bound the per-process Windows log directory so the PID-based rotation workaround does not accumulate stale files indefinitely. Signed-off-by: phernandez <paul@basicmachines.co>

github-actions · 2026-03-09T19:38:47Z

Claude finished @phernandez's task in 2m 22s —— View job

PR Review: Fix SQLite and Windows semantic regressions

Gather context and read changed files
Review code quality & standards
Review testing & documentation
Review architecture & security
Post final review summary

Summary

This PR closes three regressions with well-scoped, well-commented fixes. The latest commit (445d009) correctly addresses the Windows log accumulation issue raised in the previous review. One outstanding item from the prior review remains unaddressed.

Code Quality & Standards

`utils.py` — `# pragma: no cover` on `setup_logging` (still outstanding)

The function signature still carries # pragma: no cover at line 254, which suppresses coverage measurement for the entire function body — even though three new tests now call setup_logging directly and cover the Windows/non-Windows branching logic.

# utils.py:254
) -> None:  # pragma: no cover   ← should be removed

The untested branches (log_to_stdout, structured_context, env == "test") could each be covered with a simple monkeypatched test, or marked individually with # pragma: no cover at the line level per CLAUDE.md guidelines. As written, the three new tests provide real value but their contribution goes unmeasured. Fix this →

`utils.py` — `_cleanup_windows_log_files` f-string inconsistency (nit)

# line 337 — f-string style
logger.debug(f"Failed to delete stale Windows log file: {stale_log}")

# vs. rest of the file — lazy format style (consistent with loguru best practice)
logger.warning(
    "Collapsed duplicate vector chunk keys before embedding sync: ...",
    project_id=self.project_id,
    ...
)

Minor inconsistency only; not a functional issue.

`project_service.py` — try-except scope (acknowledged from prior review)

The try block (lines 1008–1061) still wraps all four SQL queries including the plain search_vector_chunks count queries that have no vec0 dependency. The string-match guard "no such module: vec0" correctly re-raises unrelated errors, so this remains acceptable. Worth noting as a future cleanup candidate.

Testing & Documentation

Check	Status
Unit tests for new functions/methods	✅ Good coverage of `_build_chunk_records` dedup, `setup_logging` Windows behavior, and `_cleanup_windows_log_files` retention
Duplicate chunk key collapse test	✅ `test_duplicate_rows_collapse_to_unique_chunk_keys` — clear and targeted
`sqlite-vec` graceful degradation test	✅ Mocks `OperationalError` correctly; verifies the exact returned `EmbeddingStatus` fields
Windows log per-process file test	✅ `test_setup_logging_uses_per_process_log_file_on_windows` — clean monkeypatching of `os.name` and `os.getpid`
Windows log retention test	✅ `test_setup_logging_trims_stale_windows_pid_logs` — verifies the 4-file retention ceiling correctly (6 stale → 4 kept, 2 removed)
Windows log accumulation (prior review)	✅ Fixed in `445d009` via `_cleanup_windows_log_files`
100% coverage maintained	⚠️ `setup_logging` `# pragma: no cover` — new tests won't count toward coverage
Edge case: duplicate rows with different text	⚠️ Not tested (last-write-wins behavior is undocumented in the warning log)

Basic Memory Architecture

Check	Status
Preserves local-first architecture	✅
Knowledge graph consistency	✅
MCP tool atomicity	✅ (not changed)
No DB migrations needed	✅ (no schema changes)
Markdown file integrity	✅
Decision-point comments (literate style)	✅ All three fixes have proper Trigger/Why/Outcome comments

Security & Performance

Check	Status
No hardcoded secrets	✅
Error message exposes only safe details	✅ (`str(exc).lower()` string check is safe)
Exception re-raised for Postgres	✅ (`if is_postgres or ... : raise`)
No emoji/unicode in log output	✅
Windows path comparison compatible	✅
Stale PID log OSError silenced safely	✅ Another process may hold the file; `logger.debug` is appropriate

Checklist Results

Item	Status
Coding conventions (CLAUDE.md)	✅
Python 3.12+ types & async	✅
SQLAlchemy 2.0 patterns	✅
100-char line length	✅
No commented-out code blocks	✅
Decision-point comments (literate style)	✅
100% test coverage	⚠️ `setup_logging` `# pragma: no cover` should be removed now that tests exist
Windows compatibility	✅ (improved by this PR)
No emoji/unicode in console/log output	✅

Overall Assessment

Approve with minor suggestion. The three regression fixes are correct and well-tested. The Windows log accumulation concern raised in the prior review is now resolved. The one remaining item is removing # pragma: no cover from setup_logging now that three real tests exercise it — this would require either adding tests for the remaining untested branches (log_to_stdout, structured_context, env == "test") or moving the pragma to those specific lines.

Signed-off-by: phernandez <paul@basicmachines.co>

github-actions · 2026-03-09T20:13:06Z

Claude finished @phernandez's task in 2m 40s —— View job

PR Review: Fix SQLite and Windows semantic regressions

Gather context and read changed files
Review code quality & standards
Review testing & documentation
Review architecture & security
Post final review summary

Summary

Three well-scoped fixes — graceful sqlite-vec degradation, per-process Windows log files, and duplicate chunk key deduplication — are all correctly implemented. Both prior reviews flagged one outstanding item: removing # pragma: no cover from setup_logging now that tests exist. I've prepared a fix (see below). Push permissions are restricted in this CI environment, so this will need to be applied manually.

Prepared Fix: Remove `# pragma: no cover` from `setup_logging`

src/basic_memory/utils.py line 254 — change:

# before
) -> None:  # pragma: no cover

# after  
) -> None:

tests/utils/test_setup_logging.py — add four new tests covering the remaining branches (env == "test" early return, log_to_stdout, structured_context, and the httpx/watchfiles suppression lines):

def test_setup_logging_test_env_uses_stderr_only(monkeypatch) -> None:
    """In test mode, setup_logging adds a single stderr sink and returns early."""
    added_sinks: list[str] = []

    monkeypatch.setenv("BASIC_MEMORY_ENV", "test")
    monkeypatch.setattr(utils.logger, "remove", lambda *args, **kwargs: None)
    monkeypatch.setattr(
        utils.logger,
        "add",
        lambda sink, **kwargs: added_sinks.append(sink.__name__ if hasattr(sink, "__name__") else str(sink)),
    )

    utils.setup_logging(log_to_file=True, log_to_stdout=True)

    assert len(added_sinks) == 1


def test_setup_logging_log_to_stdout(monkeypatch) -> None:
    """log_to_stdout=True should add a stderr sink on non-Windows outside test mode."""
    added_sinks: list[str] = []

    monkeypatch.setenv("BASIC_MEMORY_ENV", "dev")
    monkeypatch.setattr(utils.os, "name", "posix")
    monkeypatch.setattr(utils.logger, "remove", lambda *args, **kwargs: None)
    monkeypatch.setattr(
        utils.logger,
        "add",
        lambda sink, **kwargs: added_sinks.append(sink.__name__ if hasattr(sink, "__name__") else str(sink)),
    )

    utils.setup_logging(log_to_stdout=True)

    assert len(added_sinks) == 1


def test_setup_logging_structured_context(monkeypatch) -> None:
    """structured_context=True should call logger.configure with cloud env vars."""
    configured_extras: list[dict] = []

    monkeypatch.setenv("BASIC_MEMORY_ENV", "dev")
    monkeypatch.setattr(utils.logger, "remove", lambda *args, **kwargs: None)
    monkeypatch.setattr(utils.logger, "add", lambda *args, **kwargs: None)
    monkeypatch.setattr(
        utils.logger,
        "configure",
        lambda **kwargs: configured_extras.append(kwargs.get("extra", {})),
    )

    utils.setup_logging(structured_context=True)

    assert len(configured_extras) == 1
    assert "tenant_id" in configured_extras[0]
    assert "fly_region" in configured_extras[0]


def test_setup_logging_suppresses_noisy_loggers(monkeypatch) -> None:
    """setup_logging should raise the log level for httpx and watchfiles."""
    import logging
    monkeypatch.setenv("BASIC_MEMORY_ENV", "dev")
    monkeypatch.setattr(utils.logger, "remove", lambda *args, **kwargs: None)
    monkeypatch.setattr(utils.logger, "add", lambda *args, **kwargs: None)

    utils.setup_logging()

    assert logging.getLogger("httpx").level == logging.WARNING
    assert logging.getLogger("watchfiles.main").level == logging.WARNING

Checklist Results

Item	Status
Coding conventions (CLAUDE.md)	✅
Python 3.12+ types & async	✅
SQLAlchemy 2.0 patterns	✅
100-char line length	✅
No commented-out code blocks	✅
Decision-point comments (literate style)	✅ Excellent Trigger/Why/Outcome comments throughout
100% test coverage	⚠️ Requires removing `# pragma: no cover` + 4 new tests (fix prepared above)
Windows compatibility	✅ (improved by this PR)
`_cleanup_windows_log_files` f-string nit	⚠️ Minor — `logger.debug(f"Failed to delete...")` should use lazy format: `logger.debug("Failed to delete stale Windows log file: {path}", path=stale_log)`
sqlite-vec graceful degradation	✅
Duplicate chunk key dedup	✅
No hardcoded secrets	✅
Windows path comparisons compatible	✅

Overall Assessment

Approve with minor follow-up. All three regression fixes are correct and well-tested. The only items remaining from prior reviews are:

# pragma: no cover on setup_logging (fix prepared above — apply manually)
_cleanup_windows_log_files f-string nit — change logger.debug(f"Failed to delete stale Windows log file: {stale_log}") to logger.debug("Failed to delete stale Windows log file: {path}", path=stale_log) to match loguru's lazy-format style used elsewhere in the file

Signed-off-by: phernandez <paul@basicmachines.co>

github-actions · 2026-03-09T21:10:43Z

Claude finished @phernandez's task in 1m 26s —— View job

PR Review: Fix SQLite and Windows semantic regressions

Gather context and read changed files
Review code quality & standards
Review testing & documentation
Review architecture & security
Post final review summary

Summary

All three regression fixes are correctly implemented and all outstanding items from prior reviews have been addressed in the latest commit (9ef040e test: cover logging setup branches). This PR is ready to merge.

Resolved Items (vs. prior reviews)

Item	Status
`# pragma: no cover` removed from `setup_logging`	✅ Removed in `9ef040e`
Tests added for `env == "test"` early-return branch	✅ `test_setup_logging_test_env_uses_stderr_only`
Tests added for `log_to_stdout=True` branch	✅ `test_setup_logging_log_to_stdout`
Tests added for `structured_context=True` branch	✅ `test_setup_logging_structured_context`
Tests added for third-party logger suppression	✅ `test_setup_logging_suppresses_noisy_loggers`
`_cleanup_windows_log_files` f-string → loguru lazy format	✅ `logger.debug("... {path}", path=stale_log)` at line 337

Final Checklist

Item	Status
Coding conventions (CLAUDE.md)	✅
Python 3.12+ types & async	✅
SQLAlchemy 2.0 patterns	✅
100-char line length	✅
No commented-out code blocks	✅
Decision-point comments (literate style)	✅ Excellent Trigger/Why/Outcome comments throughout
100% test coverage	✅ `# pragma: no cover` removed; 7 tests cover all `setup_logging` branches
Windows compatibility	✅ Per-process log files + bounded cleanup
`sqlite-vec` graceful degradation	✅ `OperationalError` caught with `"no such module: vec0"` guard
Duplicate chunk key deduplication	✅ Last-write-wins dedup before embedding writes
No hardcoded secrets	✅
Windows path comparisons compatible	✅
No emoji/unicode in console/log output	✅

Overall Assessment

Approve. All three regression fixes are correct and well-tested. All items raised in the two prior reviews have been resolved. The setup_logging function now has real coverage via 7 targeted tests, the f-string nit has been fixed with loguru lazy format, and the Windows log accumulation issue is addressed by _cleanup_windows_log_files.

Fix SQLite and Windows semantic regressions

9755044

Handle missing sqlite-vec in project info, avoid Windows log rotation contention, and dedupe duplicate chunk keys during embedding sync. Refs #652 Refs #653 Refs #654 Signed-off-by: phernandez <paul@basicmachines.co>

phernandez changed the title ~~Fix SQLite and Windows semantic regressions~~ fix(core): handle SQLite and Windows semantic regressions Mar 9, 2026

chatgpt-codex-connector bot reviewed Mar 9, 2026

View reviewed changes

fix(core): trim stale Windows pid log files

445d009

Bound the per-process Windows log directory so the PID-based rotation workaround does not accumulate stale files indefinitely. Signed-off-by: phernandez <paul@basicmachines.co>

test: scope sqlite-vec regression to sqlite

02e21f8

Signed-off-by: phernandez <paul@basicmachines.co>

test: cover logging setup branches

9ef040e

Signed-off-by: phernandez <paul@basicmachines.co>

phernandez merged commit 30a8935 into main Mar 10, 2026
25 checks passed

phernandez deleted the codex/fix-652-653-654 branch March 10, 2026 03:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(core): handle SQLite and Windows semantic regressions#655

fix(core): handle SQLite and Windows semantic regressions#655
phernandez merged 4 commits intomainfrom
codex/fix-652-653-654

phernandez commented Mar 9, 2026

Uh oh!

github-actions bot commented Mar 9, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 9, 2026

Uh oh!

phernandez Mar 9, 2026

Uh oh!

github-actions bot commented Mar 9, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 9, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 9, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

phernandez commented Mar 9, 2026

Summary

Testing

Uh oh!

github-actions bot commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Fix SQLite and Windows semantic regressions

Summary

Code Quality & Standards

utils.py — # pragma: no cover vs. new tests

search_repository_base.py — last-write-wins on duplicate chunks

project_service.py — broad try-except scope

project_service.py — Windows log file accumulation

Testing & Documentation

Basic Memory Architecture

Security & Performance

Checklist Results

Overall Assessment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

phernandez Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Fix SQLite and Windows semantic regressions

Summary

Code Quality & Standards

utils.py — # pragma: no cover on setup_logging (still outstanding)

utils.py — _cleanup_windows_log_files f-string inconsistency (nit)

project_service.py — try-except scope (acknowledged from prior review)

Testing & Documentation

Basic Memory Architecture

Security & Performance

Checklist Results

Overall Assessment

Uh oh!

github-actions bot commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Fix SQLite and Windows semantic regressions

Summary

Prepared Fix: Remove # pragma: no cover from setup_logging

Checklist Results

Overall Assessment

Uh oh!

github-actions bot commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Fix SQLite and Windows semantic regressions

Summary

Resolved Items (vs. prior reviews)

Final Checklist

Overall Assessment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions bot commented Mar 9, 2026 •

edited

Loading

`utils.py` — `# pragma: no cover` vs. new tests

`search_repository_base.py` — last-write-wins on duplicate chunks

`project_service.py` — broad try-except scope

`project_service.py` — Windows log file accumulation

github-actions bot commented Mar 9, 2026 •

edited

Loading

`utils.py` — `# pragma: no cover` on `setup_logging` (still outstanding)

`utils.py` — `_cleanup_windows_log_files` f-string inconsistency (nit)

`project_service.py` — try-except scope (acknowledged from prior review)

github-actions bot commented Mar 9, 2026 •

edited

Loading

Prepared Fix: Remove `# pragma: no cover` from `setup_logging`

github-actions bot commented Mar 9, 2026 •

edited

Loading