memU embeddings: route through MEMU_EMBEDDING_BASE_URL and backfill NULL rows#70
Open
alex-fedotyev wants to merge 2 commits into
Open
memU embeddings: route through MEMU_EMBEDDING_BASE_URL and backfill NULL rows#70alex-fedotyev wants to merge 2 commits into
alex-fedotyev wants to merge 2 commits into
Conversation
…_URL
Symptom: memory_recall returns "No relevant memories found" for any
query that should match a memory written after 2026-05-06. memU's
own logs reported route_category and recall succeeding, but vector
search at recall time was effectively disabled.
Root cause: memu_bridge gated the embedding LLM profile, the
"rag" retrieve method, and the categorize_items step exclusively on
self.config.openai_api_key. The 2026-05-05 sidecar work landed an
Ollama service in docker-compose.yml and set MEMU_EMBEDDING_BASE_URL
on the agent container, but the in-process bridge code never read
the env var. With openai_api_key cleared (the sidecar was supposed
to replace it) the bridge took the no-embed branch:
- _categorize_no_embed replaced categorize_items, persisting every
new item with embedding_json=NULL.
- retrieve_config["method"] resolved to "llm" instead of "rag", so
recall fanned out chat queries against the categories instead of
doing a cosine search over item vectors.
- _has_embeddings returned False, suppressing vector lookups
upstream of recall too.
5,100 of 6,888 memu_memory_items rows on the running container had
embedding_json=NULL; all 879 with embeddings dated to 2026-05-05
when the OpenAI key was still active. RCA in
notes/lessons/2026-05-09-memu-embeddings-not-wired.md.
Fix: resolve the embedding endpoint as env -> YAML -> OpenAI in one
place, set a single embeddings_configured flag, and key every
downstream behavior off that flag. Concretely:
- nerve/config.py: MemoryConfig gains embedding_base_url,
embedding_api_key, and llm_concurrency fields with env-var-aware
defaults. llm_concurrency clamps to >= 1 since 0 deadlocks the
semaphore wrapper.
- nerve/memory/memu_bridge.py:
- _initialize() reads MEMU_EMBEDDING_BASE_URL,
MEMU_EMBEDDING_API_KEY, and MEMU_EMBED_MODEL with YAML config
fallback. When base_url is set, registers the embedding profile
against that endpoint with api_key="placeholder" if not provided
(the OpenAI SDK requires a non-empty string; Ollama and TEI
ignore it).
- embeddings_configured = (env or YAML base URL set) OR (openai
key set). _categorize_no_embed only takes the no-embed path
when neither provider is configured. retrieve method is "rag"
when configured, "llm" otherwise. memory_extract_llm_profile
follows the same flag.
- _has_embeddings checks all three sources (env, YAML, OpenAI).
- Bounded asyncio.Semaphore wraps memU's chat calls so the
per-memory-type fan-out (4-way gather in memU's
extract_items pipeline) doesn't blow the Anthropic rate limit
on lower API tiers. Configurable via memory.llm_concurrency,
default 1. Re-instrumentation reuses the same Semaphore so
callers already queued don't lose their slot.
- SDK retries enabled at max_retries=4 (was 0). With concurrency
bounded, retries actually drain the queue instead of stacking.
- nerve/bootstrap.py: _build_docker_compose now writes the
embeddings service block (Ollama + nomic-embed-text), the
MEMU_EMBEDDING_* env vars on the nerve service,
depends_on: embeddings: condition: service_healthy, the
~/.nerve/claude bind mount for persisted Claude Code state,
the path-aligned ${HOME}/nerve-workspace and ${HOME}/projects
mounts, and /var/run/docker.sock for direct daemon access.
The entrypoint creates /root/* symlinks pointing at HOST_HOME so
hardcoded /root/nerve-workspace and /root/projects paths still
resolve. Brings nerve init regeneration in line with the live
host docker-compose.yml.
Tests: 18 new tests in test_memu_bridge.py covering the new
MemoryConfig fields, llm_concurrency clamping, and the semaphore
wrapper (serialization at concurrency=1, peak respect at
concurrency=3, instance reuse across resets). 4 updates in
test_bootstrap.py for the host-aligned mount assertions and the
NERVE_DOCKER unset that lets the test pass when run inside the
agent container.
Notes: the original 2026-05-05 stash also added a
"nerve-services = nerve.services:main" console-script entry to
pyproject.toml and assorted comments about a docker-mcp sidecar.
nerve/services lives only on the abandoned alex/docker-mcp-spike
branch, so installing with that entry breaks pip install -e. The
entry and the sidecar comments are dropped here. The
engine-SDK-resume-guard half of the original stash already shipped
on f39e62b and is excluded.
memU rows written between 2026-05-06 and the embeddings-fix landing
have embedding_json=NULL because the bridge wasn't reading the
MEMU_EMBEDDING_BASE_URL env var. The fix only applies to new writes;
existing rows need a one-time backfill to make recall work against
historical memories.
Walks memu_memory_items (text source: summary) and memu_resources
(text source: caption) for rows with NULL or empty embedding_json,
batches them 32 at a time, posts to {MEMU_EMBEDDING_BASE_URL}/embeddings
with the OpenAI-compatible payload Ollama and the OpenAI API both
accept, and writes the resulting vectors back. Skips rows with NULL
or empty text since there's nothing to embed and the endpoint
rejects empty input.
Idempotent: the WHERE clause filters on the NULL state, so re-runs
only touch rows that still need work. Single transaction per batch,
so an interrupt loses at most one batch.
Flags:
- --dry-run: count pending rows without embedding or writing.
- --limit N: stop after N rows per table for incremental runs.
- --table: backfill only memu_memory_items or memu_resources.
- --batch-size, --db, --verbose for ops control.
Validated against the running container's memu.sqlite:
- --dry-run: 6,009 memu_memory_items + 329 memu_resources pending.
- --limit 100 --table memu_memory_items: wrote 100 768-dim vectors
in 1.5s. Re-running --dry-run reports 5,909, confirming idempotency.
memu_memory_categories doesn't need backfill: those embeddings were
populated on 2026-05-05 and never wiped.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
memU stopped writing embeddings on 2026-05-06. Every memorize call from then on persisted
embedding_json=NULL, vector search at recall time was disabled, andmemory_recallreturned "No relevant memories found" for any query that should have hit a memory written in the last few days.The 2026-05-05 sidecar work (Ollama at
http://embeddings:11434/v1) actually shipped on the host docker-compose.yml: the env varMEMU_EMBEDDING_BASE_URLwas set on the agent container, the Ollama service was running and healthy, and the model returned valid 768-dim vectors. But the in-processnerve.memory.memu_bridgecode never read the env var. It gated the embedding LLM profile, the "rag" retrieve method, and the categorize_items step exclusively onself.config.openai_api_key. With OpenAI cleared (the sidecar was supposed to replace it) the bridge took the no-embed branch and persisted every new item withembedding_json=NULL. RCA innotes/lessons/2026-05-09-memu-embeddings-not-wired.md.5,100 of 6,888
memu_memory_itemsrows on the running container have NULL embeddings; the 879 with embeddings all date to 2026-05-05 when the OpenAI key was still active. Livememory_recall("AI Summary HDX-3992 redactSecrets", 5)returned no results despite a known matching memory.What this PR does
Routing fix (commit fbb74d7): resolve the embedding endpoint as env -> YAML -> OpenAI in one place, set a single
embeddings_configuredflag, and key every downstream behavior off that flag.nerve/config.py:MemoryConfiggainsembedding_base_url,embedding_api_key, andllm_concurrencyfields. Env varsMEMU_EMBEDDING_BASE_URL,MEMU_EMBEDDING_API_KEY,MEMU_EMBED_MODELoverride the YAML defaults at runtime so the docker compose path doesn't have to rewrite config.yaml on every container restart.nerve/memory/memu_bridge.py:_initializereads the env vars first, registers the embedding profile against the resolved endpoint, and falls back to OpenAI only when neither env nor YAML is set._has_embeddings,_categorize_no_embed, the retrieve config method ("rag"vs"llm"), andmemory_extract_llm_profileall key off the same unified flag.asyncio.Semaphorewraps memU's chat calls so the per-memory-type fan-out (a 4-waygatherin memU's extract_items pipeline) can't blow the Anthropic rate limit on lower API tiers. Configurable viamemory.llm_concurrency, default 1. Re-instrumentation reuses the same Semaphore so callers already queued don't lose their slot. SDKmax_retriesbumped from 0 back to 4 (with concurrency bounded, retries actually drain the queue instead of stacking).nerve/bootstrap.py::_build_docker_composenow writes theembeddingsservice block, theMEMU_EMBEDDING_*env vars on the nerve service,depends_on: embeddings: condition: service_healthy, the~/.nerve/claudemount for persisted Claude Code state, the path-aligned${HOME}/nerve-workspaceand${HOME}/projectsmounts, and the/var/run/docker.sockmount. The entrypoint creates/root/*symlinks pointing atHOST_HOMEso hardcoded/root/nerve-workspaceand/root/projectspaths still resolve. Bringsnerve initregeneration in line with what the live host docker-compose.yml already has.Backfill (commit db3f588):
scripts/backfill_memu_embeddings.pywalksmemu_memory_items(text source:summary) andmemu_resources(text source:caption) for rows with NULL or emptyembedding_json, batches 32 at a time, posts to{MEMU_EMBEDDING_BASE_URL}/embeddingswith the OpenAI-compatible payload Ollama accepts, and writes the resulting vectors back. Idempotent; supports--dry-run,--limit,--table. Single transaction per batch so an interrupt loses at most one batch.Test plan
pytest tests/test_memu_bridge.py tests/test_bootstrap.py-> 115 passed, 0 failed. Full suite: 447 passed, 2 skipped (the only failure was a pre-existingtest_cli_upgrade.py::test_docker_mode_bails_outunrelated to these changes).memu.sqlite: reports 6,009memu_memory_items+ 329memu_resourcespending (excluding 103 resources with NULL captions). Counts match the SQL prior to my run.--limit 100 --table memu_memory_items: wrote 100 768-dim vectors in 1.5s. Sample row spot-check: vector length 768, leading dims[0.0096, 0.037, -0.132]. Re-running--dry-runreports 5,909 pending, confirming idempotency (5,909 = 6,009 - 100).curl POST http://embeddings:11434/v1/embeddingswith batch input returns 768-dim vectors per input, response shape matches OpenAI's contract.memorizeround-trip on the running container: deferred. The fix lives in this branch and the running daemon is onclaude/engine-sdk-resume-guardwith the buggy bridge. Restarting the agent kills the active chat session per TASK.md, so I'll let the user pick the moment to recreate the container with the new code. Once restarted, a new memorize call should write a non-NULLembedding_jsonof length 768 andmemory_recallagainst that memory should return it.Notes for the reviewer
nerve-services = nerve.services:mainconsole-script entry inpyproject.tomland several comments referencing a docker-mcp sidecar.nerve/services.pylives only on the abandonedalex/docker-mcp-spikebranch, so installing with that entry would breakpip install -e. Dropped here, with the dangling sidecar comments rewritten to describe the actual current setup (direct host socket mount).f39e62band is excluded.notes/outcomes/2026-05-workflow-automation-loop-closure.mdAC14 to PASS once the user restarts the container and confirms a recall round-trip works.Closes the recall regression tracked under task
memU: replace OpenAI embeddings with a self-hosted free embedding sidecar.