memU embeddings: route through MEMU_EMBEDDING_BASE_URL and backfill NULL rows by alex-fedotyev · Pull Request #70 · ClickHouse/nerve

alex-fedotyev · 2026-05-09T02:59:06Z

Summary

memU stopped writing embeddings on 2026-05-06. Every memorize call from then on persisted embedding_json=NULL, vector search at recall time was disabled, and memory_recall returned "No relevant memories found" for any query that should have hit a memory written in the last few days.

The 2026-05-05 sidecar work (Ollama at http://embeddings:11434/v1) actually shipped on the host docker-compose.yml: the env var MEMU_EMBEDDING_BASE_URL was set on the agent container, the Ollama service was running and healthy, and the model returned valid 768-dim vectors. But the in-process nerve.memory.memu_bridge code never read the env var. It gated the embedding LLM profile, the "rag" retrieve method, and the categorize_items step exclusively on self.config.openai_api_key. With OpenAI cleared (the sidecar was supposed to replace it) the bridge took the no-embed branch and persisted every new item with embedding_json=NULL. RCA in notes/lessons/2026-05-09-memu-embeddings-not-wired.md.

5,100 of 6,888 memu_memory_items rows on the running container have NULL embeddings; the 879 with embeddings all date to 2026-05-05 when the OpenAI key was still active. Live memory_recall("AI Summary HDX-3992 redactSecrets", 5) returned no results despite a known matching memory.

What this PR does

Routing fix (commit fbb74d7): resolve the embedding endpoint as env -> YAML -> OpenAI in one place, set a single embeddings_configured flag, and key every downstream behavior off that flag.

nerve/config.py: MemoryConfig gains embedding_base_url, embedding_api_key, and llm_concurrency fields. Env vars MEMU_EMBEDDING_BASE_URL, MEMU_EMBEDDING_API_KEY, MEMU_EMBED_MODEL override the YAML defaults at runtime so the docker compose path doesn't have to rewrite config.yaml on every container restart.
nerve/memory/memu_bridge.py: _initialize reads the env vars first, registers the embedding profile against the resolved endpoint, and falls back to OpenAI only when neither env nor YAML is set. _has_embeddings, _categorize_no_embed, the retrieve config method ("rag" vs "llm"), and memory_extract_llm_profile all key off the same unified flag.
A bounded asyncio.Semaphore wraps memU's chat calls so the per-memory-type fan-out (a 4-way gather in memU's extract_items pipeline) can't blow the Anthropic rate limit on lower API tiers. Configurable via memory.llm_concurrency, default 1. Re-instrumentation reuses the same Semaphore so callers already queued don't lose their slot. SDK max_retries bumped from 0 back to 4 (with concurrency bounded, retries actually drain the queue instead of stacking).
nerve/bootstrap.py::_build_docker_compose now writes the embeddings service block, the MEMU_EMBEDDING_* env vars on the nerve service, depends_on: embeddings: condition: service_healthy, the ~/.nerve/claude mount for persisted Claude Code state, the path-aligned ${HOME}/nerve-workspace and ${HOME}/projects mounts, and the /var/run/docker.sock mount. The entrypoint creates /root/* symlinks pointing at HOST_HOME so hardcoded /root/nerve-workspace and /root/projects paths still resolve. Brings nerve init regeneration in line with what the live host docker-compose.yml already has.

Backfill (commit db3f588): scripts/backfill_memu_embeddings.py walks memu_memory_items (text source: summary) and memu_resources (text source: caption) for rows with NULL or empty embedding_json, batches 32 at a time, posts to {MEMU_EMBEDDING_BASE_URL}/embeddings with the OpenAI-compatible payload Ollama accepts, and writes the resulting vectors back. Idempotent; supports --dry-run, --limit, --table. Single transaction per batch so an interrupt loses at most one batch.

Test plan

Unit: pytest tests/test_memu_bridge.py tests/test_bootstrap.py -> 115 passed, 0 failed. Full suite: 447 passed, 2 skipped (the only failure was a pre-existing test_cli_upgrade.py::test_docker_mode_bails_out unrelated to these changes).
Backfill dry-run on the running container's memu.sqlite: reports 6,009 memu_memory_items + 329 memu_resources pending (excluding 103 resources with NULL captions). Counts match the SQL prior to my run.
Backfill --limit 100 --table memu_memory_items: wrote 100 768-dim vectors in 1.5s. Sample row spot-check: vector length 768, leading dims [0.0096, 0.037, -0.132]. Re-running --dry-run reports 5,909 pending, confirming idempotency (5,909 = 6,009 - 100).
Endpoint sanity: curl POST http://embeddings:11434/v1/embeddings with batch input returns 768-dim vectors per input, response shape matches OpenAI's contract.
Live memorize round-trip on the running container: deferred. The fix lives in this branch and the running daemon is on claude/engine-sdk-resume-guard with the buggy bridge. Restarting the agent kills the active chat session per TASK.md, so I'll let the user pick the moment to recreate the container with the new code. Once restarted, a new memorize call should write a non-NULL embedding_json of length 768 and memory_recall against that memory should return it.
Full backfill (~6,338 rows): worth running once after the user restarts the container so historical recall works. The script's progress logging makes it easy to resume.

Notes for the reviewer

The original 2026-05-05 stash also bundled a nerve-services = nerve.services:main console-script entry in pyproject.toml and several comments referencing a docker-mcp sidecar. nerve/services.py lives only on the abandoned alex/docker-mcp-spike branch, so installing with that entry would break pip install -e. Dropped here, with the dangling sidecar comments rewritten to describe the actual current setup (direct host socket mount).
The engine-SDK-resume-guard half of the original stash already shipped on f39e62b and is excluded.
Tier estimate: ~250 prod lines added. Single concern (memU embedding routing). The bootstrap.py delta is the largest part and is mostly compose YAML in an f-string plus an entrypoint shell snippet, not new branching logic.
After merge, I'll update notes/outcomes/2026-05-workflow-automation-loop-closure.md AC14 to PASS once the user restarts the container and confirms a recall round-trip works.

Closes the recall regression tracked under task memU: replace OpenAI embeddings with a self-hosted free embedding sidecar.

…_URL Symptom: memory_recall returns "No relevant memories found" for any query that should match a memory written after 2026-05-06. memU's own logs reported route_category and recall succeeding, but vector search at recall time was effectively disabled. Root cause: memu_bridge gated the embedding LLM profile, the "rag" retrieve method, and the categorize_items step exclusively on self.config.openai_api_key. The 2026-05-05 sidecar work landed an Ollama service in docker-compose.yml and set MEMU_EMBEDDING_BASE_URL on the agent container, but the in-process bridge code never read the env var. With openai_api_key cleared (the sidecar was supposed to replace it) the bridge took the no-embed branch: - _categorize_no_embed replaced categorize_items, persisting every new item with embedding_json=NULL. - retrieve_config["method"] resolved to "llm" instead of "rag", so recall fanned out chat queries against the categories instead of doing a cosine search over item vectors. - _has_embeddings returned False, suppressing vector lookups upstream of recall too. 5,100 of 6,888 memu_memory_items rows on the running container had embedding_json=NULL; all 879 with embeddings dated to 2026-05-05 when the OpenAI key was still active. RCA in notes/lessons/2026-05-09-memu-embeddings-not-wired.md. Fix: resolve the embedding endpoint as env -> YAML -> OpenAI in one place, set a single embeddings_configured flag, and key every downstream behavior off that flag. Concretely: - nerve/config.py: MemoryConfig gains embedding_base_url, embedding_api_key, and llm_concurrency fields with env-var-aware defaults. llm_concurrency clamps to >= 1 since 0 deadlocks the semaphore wrapper. - nerve/memory/memu_bridge.py: - _initialize() reads MEMU_EMBEDDING_BASE_URL, MEMU_EMBEDDING_API_KEY, and MEMU_EMBED_MODEL with YAML config fallback. When base_url is set, registers the embedding profile against that endpoint with api_key="placeholder" if not provided (the OpenAI SDK requires a non-empty string; Ollama and TEI ignore it). - embeddings_configured = (env or YAML base URL set) OR (openai key set). _categorize_no_embed only takes the no-embed path when neither provider is configured. retrieve method is "rag" when configured, "llm" otherwise. memory_extract_llm_profile follows the same flag. - _has_embeddings checks all three sources (env, YAML, OpenAI). - Bounded asyncio.Semaphore wraps memU's chat calls so the per-memory-type fan-out (4-way gather in memU's extract_items pipeline) doesn't blow the Anthropic rate limit on lower API tiers. Configurable via memory.llm_concurrency, default 1. Re-instrumentation reuses the same Semaphore so callers already queued don't lose their slot. - SDK retries enabled at max_retries=4 (was 0). With concurrency bounded, retries actually drain the queue instead of stacking. - nerve/bootstrap.py: _build_docker_compose now writes the embeddings service block (Ollama + nomic-embed-text), the MEMU_EMBEDDING_* env vars on the nerve service, depends_on: embeddings: condition: service_healthy, the ~/.nerve/claude bind mount for persisted Claude Code state, the path-aligned ${HOME}/nerve-workspace and ${HOME}/projects mounts, and /var/run/docker.sock for direct daemon access. The entrypoint creates /root/* symlinks pointing at HOST_HOME so hardcoded /root/nerve-workspace and /root/projects paths still resolve. Brings nerve init regeneration in line with the live host docker-compose.yml. Tests: 18 new tests in test_memu_bridge.py covering the new MemoryConfig fields, llm_concurrency clamping, and the semaphore wrapper (serialization at concurrency=1, peak respect at concurrency=3, instance reuse across resets). 4 updates in test_bootstrap.py for the host-aligned mount assertions and the NERVE_DOCKER unset that lets the test pass when run inside the agent container. Notes: the original 2026-05-05 stash also added a "nerve-services = nerve.services:main" console-script entry to pyproject.toml and assorted comments about a docker-mcp sidecar. nerve/services lives only on the abandoned alex/docker-mcp-spike branch, so installing with that entry breaks pip install -e. The entry and the sidecar comments are dropped here. The engine-SDK-resume-guard half of the original stash already shipped on f39e62b and is excluded.

memU rows written between 2026-05-06 and the embeddings-fix landing have embedding_json=NULL because the bridge wasn't reading the MEMU_EMBEDDING_BASE_URL env var. The fix only applies to new writes; existing rows need a one-time backfill to make recall work against historical memories. Walks memu_memory_items (text source: summary) and memu_resources (text source: caption) for rows with NULL or empty embedding_json, batches them 32 at a time, posts to {MEMU_EMBEDDING_BASE_URL}/embeddings with the OpenAI-compatible payload Ollama and the OpenAI API both accept, and writes the resulting vectors back. Skips rows with NULL or empty text since there's nothing to embed and the endpoint rejects empty input. Idempotent: the WHERE clause filters on the NULL state, so re-runs only touch rows that still need work. Single transaction per batch, so an interrupt loses at most one batch. Flags: - --dry-run: count pending rows without embedding or writing. - --limit N: stop after N rows per table for incremental runs. - --table: backfill only memu_memory_items or memu_resources. - --batch-size, --db, --verbose for ops control. Validated against the running container's memu.sqlite: - --dry-run: 6,009 memu_memory_items + 329 memu_resources pending. - --limit 100 --table memu_memory_items: wrote 100 768-dim vectors in 1.5s. Re-running --dry-run reports 5,909, confirming idempotency. memu_memory_categories doesn't need backfill: those embeddings were populated on 2026-05-05 and never wiped.

alex-fedotyev added 2 commits May 9, 2026 02:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

memU embeddings: route through MEMU_EMBEDDING_BASE_URL and backfill NULL rows#70

memU embeddings: route through MEMU_EMBEDDING_BASE_URL and backfill NULL rows#70
alex-fedotyev wants to merge 2 commits into
ClickHouse:mainfrom
alex-fedotyev:alex/memu-self-hosted-embeddings

alex-fedotyev commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

alex-fedotyev commented May 9, 2026

Summary

What this PR does

Test plan

Notes for the reviewer

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant