Benchmark: Adopt Backboard's LoCoMo methodology for reproducible comparison

## Context

Backboard published a fully reproducible LoCoMo benchmark with:
- Per-conversation isolated evaluation
- Multi-session conversation ingestion with timestamps
- GPT-4.1 judge with fixed prompts and seed
- Published logs, prompts, and verdicts for every question
- One-click replication script

We should adopt the same methodology so our results are directly comparable.

## What to Adapt

### From their approach:
1. **Per-conversation isolation** — create separate BM projects per conversation (they create separate assistants)
2. **Turn-by-turn ingestion** — ingest conversation turns sequentially, preserving session boundaries and timestamps
3. **Separate question thread** — ask questions after all sessions ingested, using only BM search for context
4. **Fixed judge config** — same GPT-4.1 judge prompts, same seed, deterministic evaluation
5. **Full transparency** — publish all prompts, retrieved context, generated answers, and judge verdicts

### What we do differently (advantages):
1. **Include adversarial category** — they skip it, we test it
2. **Report retrieval metrics alongside accuracy** — shows WHERE improvements come from (retrieval vs LLM reasoning)
3. **Local-first execution** — no cloud API dependency, fully reproducible offline (except judge step)
4. **Multiple retrieval strategies** — test FTS, vector, hybrid, with/without time-decay

## Repo
Results should go in the benchmark repo (openclaw-basic-memory or standalone per basicmachines-co/basic-memory-benchmarks#10)

## Milestone
v0.19.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark: Adopt Backboard's LoCoMo methodology for reproducible comparison #8

Context

What to Adapt

From their approach:

What we do differently (advantages):

Repo

Milestone

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Benchmark: Adopt Backboard's LoCoMo methodology for reproducible comparison #8

Description

Context

What to Adapt

From their approach:

What we do differently (advantages):

Repo

Milestone

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions