Skip to content

memU embeddings: route through MEMU_EMBEDDING_BASE_URL and backfill NULL rows#70

Open
alex-fedotyev wants to merge 2 commits into
ClickHouse:mainfrom
alex-fedotyev:alex/memu-self-hosted-embeddings
Open

memU embeddings: route through MEMU_EMBEDDING_BASE_URL and backfill NULL rows#70
alex-fedotyev wants to merge 2 commits into
ClickHouse:mainfrom
alex-fedotyev:alex/memu-self-hosted-embeddings

Conversation

@alex-fedotyev
Copy link
Copy Markdown

Summary

memU stopped writing embeddings on 2026-05-06. Every memorize call from then on persisted embedding_json=NULL, vector search at recall time was disabled, and memory_recall returned "No relevant memories found" for any query that should have hit a memory written in the last few days.

The 2026-05-05 sidecar work (Ollama at http://embeddings:11434/v1) actually shipped on the host docker-compose.yml: the env var MEMU_EMBEDDING_BASE_URL was set on the agent container, the Ollama service was running and healthy, and the model returned valid 768-dim vectors. But the in-process nerve.memory.memu_bridge code never read the env var. It gated the embedding LLM profile, the "rag" retrieve method, and the categorize_items step exclusively on self.config.openai_api_key. With OpenAI cleared (the sidecar was supposed to replace it) the bridge took the no-embed branch and persisted every new item with embedding_json=NULL. RCA in notes/lessons/2026-05-09-memu-embeddings-not-wired.md.

5,100 of 6,888 memu_memory_items rows on the running container have NULL embeddings; the 879 with embeddings all date to 2026-05-05 when the OpenAI key was still active. Live memory_recall("AI Summary HDX-3992 redactSecrets", 5) returned no results despite a known matching memory.

What this PR does

Routing fix (commit fbb74d7): resolve the embedding endpoint as env -> YAML -> OpenAI in one place, set a single embeddings_configured flag, and key every downstream behavior off that flag.

  • nerve/config.py: MemoryConfig gains embedding_base_url, embedding_api_key, and llm_concurrency fields. Env vars MEMU_EMBEDDING_BASE_URL, MEMU_EMBEDDING_API_KEY, MEMU_EMBED_MODEL override the YAML defaults at runtime so the docker compose path doesn't have to rewrite config.yaml on every container restart.
  • nerve/memory/memu_bridge.py: _initialize reads the env vars first, registers the embedding profile against the resolved endpoint, and falls back to OpenAI only when neither env nor YAML is set. _has_embeddings, _categorize_no_embed, the retrieve config method ("rag" vs "llm"), and memory_extract_llm_profile all key off the same unified flag.
  • A bounded asyncio.Semaphore wraps memU's chat calls so the per-memory-type fan-out (a 4-way gather in memU's extract_items pipeline) can't blow the Anthropic rate limit on lower API tiers. Configurable via memory.llm_concurrency, default 1. Re-instrumentation reuses the same Semaphore so callers already queued don't lose their slot. SDK max_retries bumped from 0 back to 4 (with concurrency bounded, retries actually drain the queue instead of stacking).
  • nerve/bootstrap.py::_build_docker_compose now writes the embeddings service block, the MEMU_EMBEDDING_* env vars on the nerve service, depends_on: embeddings: condition: service_healthy, the ~/.nerve/claude mount for persisted Claude Code state, the path-aligned ${HOME}/nerve-workspace and ${HOME}/projects mounts, and the /var/run/docker.sock mount. The entrypoint creates /root/* symlinks pointing at HOST_HOME so hardcoded /root/nerve-workspace and /root/projects paths still resolve. Brings nerve init regeneration in line with what the live host docker-compose.yml already has.

Backfill (commit db3f588): scripts/backfill_memu_embeddings.py walks memu_memory_items (text source: summary) and memu_resources (text source: caption) for rows with NULL or empty embedding_json, batches 32 at a time, posts to {MEMU_EMBEDDING_BASE_URL}/embeddings with the OpenAI-compatible payload Ollama accepts, and writes the resulting vectors back. Idempotent; supports --dry-run, --limit, --table. Single transaction per batch so an interrupt loses at most one batch.

Test plan

  • Unit: pytest tests/test_memu_bridge.py tests/test_bootstrap.py -> 115 passed, 0 failed. Full suite: 447 passed, 2 skipped (the only failure was a pre-existing test_cli_upgrade.py::test_docker_mode_bails_out unrelated to these changes).
  • Backfill dry-run on the running container's memu.sqlite: reports 6,009 memu_memory_items + 329 memu_resources pending (excluding 103 resources with NULL captions). Counts match the SQL prior to my run.
  • Backfill --limit 100 --table memu_memory_items: wrote 100 768-dim vectors in 1.5s. Sample row spot-check: vector length 768, leading dims [0.0096, 0.037, -0.132]. Re-running --dry-run reports 5,909 pending, confirming idempotency (5,909 = 6,009 - 100).
  • Endpoint sanity: curl POST http://embeddings:11434/v1/embeddings with batch input returns 768-dim vectors per input, response shape matches OpenAI's contract.
  • Live memorize round-trip on the running container: deferred. The fix lives in this branch and the running daemon is on claude/engine-sdk-resume-guard with the buggy bridge. Restarting the agent kills the active chat session per TASK.md, so I'll let the user pick the moment to recreate the container with the new code. Once restarted, a new memorize call should write a non-NULL embedding_json of length 768 and memory_recall against that memory should return it.
  • Full backfill (~6,338 rows): worth running once after the user restarts the container so historical recall works. The script's progress logging makes it easy to resume.

Notes for the reviewer

  • The original 2026-05-05 stash also bundled a nerve-services = nerve.services:main console-script entry in pyproject.toml and several comments referencing a docker-mcp sidecar. nerve/services.py lives only on the abandoned alex/docker-mcp-spike branch, so installing with that entry would break pip install -e. Dropped here, with the dangling sidecar comments rewritten to describe the actual current setup (direct host socket mount).
  • The engine-SDK-resume-guard half of the original stash already shipped on f39e62b and is excluded.
  • Tier estimate: ~250 prod lines added. Single concern (memU embedding routing). The bootstrap.py delta is the largest part and is mostly compose YAML in an f-string plus an entrypoint shell snippet, not new branching logic.
  • After merge, I'll update notes/outcomes/2026-05-workflow-automation-loop-closure.md AC14 to PASS once the user restarts the container and confirms a recall round-trip works.

Closes the recall regression tracked under task memU: replace OpenAI embeddings with a self-hosted free embedding sidecar.

…_URL

Symptom: memory_recall returns "No relevant memories found" for any
query that should match a memory written after 2026-05-06. memU's
own logs reported route_category and recall succeeding, but vector
search at recall time was effectively disabled.

Root cause: memu_bridge gated the embedding LLM profile, the
"rag" retrieve method, and the categorize_items step exclusively on
self.config.openai_api_key. The 2026-05-05 sidecar work landed an
Ollama service in docker-compose.yml and set MEMU_EMBEDDING_BASE_URL
on the agent container, but the in-process bridge code never read
the env var. With openai_api_key cleared (the sidecar was supposed
to replace it) the bridge took the no-embed branch:

- _categorize_no_embed replaced categorize_items, persisting every
  new item with embedding_json=NULL.
- retrieve_config["method"] resolved to "llm" instead of "rag", so
  recall fanned out chat queries against the categories instead of
  doing a cosine search over item vectors.
- _has_embeddings returned False, suppressing vector lookups
  upstream of recall too.

5,100 of 6,888 memu_memory_items rows on the running container had
embedding_json=NULL; all 879 with embeddings dated to 2026-05-05
when the OpenAI key was still active. RCA in
notes/lessons/2026-05-09-memu-embeddings-not-wired.md.

Fix: resolve the embedding endpoint as env -> YAML -> OpenAI in one
place, set a single embeddings_configured flag, and key every
downstream behavior off that flag. Concretely:

- nerve/config.py: MemoryConfig gains embedding_base_url,
  embedding_api_key, and llm_concurrency fields with env-var-aware
  defaults. llm_concurrency clamps to >= 1 since 0 deadlocks the
  semaphore wrapper.
- nerve/memory/memu_bridge.py:
  - _initialize() reads MEMU_EMBEDDING_BASE_URL,
    MEMU_EMBEDDING_API_KEY, and MEMU_EMBED_MODEL with YAML config
    fallback. When base_url is set, registers the embedding profile
    against that endpoint with api_key="placeholder" if not provided
    (the OpenAI SDK requires a non-empty string; Ollama and TEI
    ignore it).
  - embeddings_configured = (env or YAML base URL set) OR (openai
    key set). _categorize_no_embed only takes the no-embed path
    when neither provider is configured. retrieve method is "rag"
    when configured, "llm" otherwise. memory_extract_llm_profile
    follows the same flag.
  - _has_embeddings checks all three sources (env, YAML, OpenAI).
  - Bounded asyncio.Semaphore wraps memU's chat calls so the
    per-memory-type fan-out (4-way gather in memU's
    extract_items pipeline) doesn't blow the Anthropic rate limit
    on lower API tiers. Configurable via memory.llm_concurrency,
    default 1. Re-instrumentation reuses the same Semaphore so
    callers already queued don't lose their slot.
  - SDK retries enabled at max_retries=4 (was 0). With concurrency
    bounded, retries actually drain the queue instead of stacking.
- nerve/bootstrap.py: _build_docker_compose now writes the
  embeddings service block (Ollama + nomic-embed-text), the
  MEMU_EMBEDDING_* env vars on the nerve service,
  depends_on: embeddings: condition: service_healthy, the
  ~/.nerve/claude bind mount for persisted Claude Code state,
  the path-aligned ${HOME}/nerve-workspace and ${HOME}/projects
  mounts, and /var/run/docker.sock for direct daemon access.
  The entrypoint creates /root/* symlinks pointing at HOST_HOME so
  hardcoded /root/nerve-workspace and /root/projects paths still
  resolve. Brings nerve init regeneration in line with the live
  host docker-compose.yml.

Tests: 18 new tests in test_memu_bridge.py covering the new
MemoryConfig fields, llm_concurrency clamping, and the semaphore
wrapper (serialization at concurrency=1, peak respect at
concurrency=3, instance reuse across resets). 4 updates in
test_bootstrap.py for the host-aligned mount assertions and the
NERVE_DOCKER unset that lets the test pass when run inside the
agent container.

Notes: the original 2026-05-05 stash also added a
"nerve-services = nerve.services:main" console-script entry to
pyproject.toml and assorted comments about a docker-mcp sidecar.
nerve/services lives only on the abandoned alex/docker-mcp-spike
branch, so installing with that entry breaks pip install -e. The
entry and the sidecar comments are dropped here. The
engine-SDK-resume-guard half of the original stash already shipped
on f39e62b and is excluded.
memU rows written between 2026-05-06 and the embeddings-fix landing
have embedding_json=NULL because the bridge wasn't reading the
MEMU_EMBEDDING_BASE_URL env var. The fix only applies to new writes;
existing rows need a one-time backfill to make recall work against
historical memories.

Walks memu_memory_items (text source: summary) and memu_resources
(text source: caption) for rows with NULL or empty embedding_json,
batches them 32 at a time, posts to {MEMU_EMBEDDING_BASE_URL}/embeddings
with the OpenAI-compatible payload Ollama and the OpenAI API both
accept, and writes the resulting vectors back. Skips rows with NULL
or empty text since there's nothing to embed and the endpoint
rejects empty input.

Idempotent: the WHERE clause filters on the NULL state, so re-runs
only touch rows that still need work. Single transaction per batch,
so an interrupt loses at most one batch.

Flags:
- --dry-run: count pending rows without embedding or writing.
- --limit N: stop after N rows per table for incremental runs.
- --table: backfill only memu_memory_items or memu_resources.
- --batch-size, --db, --verbose for ops control.

Validated against the running container's memu.sqlite:
- --dry-run: 6,009 memu_memory_items + 329 memu_resources pending.
- --limit 100 --table memu_memory_items: wrote 100 768-dim vectors
  in 1.5s. Re-running --dry-run reports 5,909, confirming idempotency.

memu_memory_categories doesn't need backfill: those embeddings were
populated on 2026-05-05 and never wiped.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant