-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Context
MemMachine's benchmark blog discovered that LoCoMo category assignments in the paper differ from the source code:
'This finding suggests that some public LoCoMo results might be presenting misclassified data, making a direct and fair comparison challenging.'
They use the source code assignments as ground truth, not the paper's descriptions.
Action
- Compare our category assignments against the LoCoMo source code (github.com/snap-research/LoCoMo)
- Document any discrepancies with the paper
- Ensure our per-category results use the correct assignments
- If our categories were wrong, re-run and report corrected numbers
This is important for credibility — if we publish numbers with wrong categories, competitors will call it out.
Related
- Benchmark: Add LLM-as-Judge evaluation (GPT-4.1) for LoCoMo #9, Benchmark: Adopt Backboard's LoCoMo methodology for reproducible comparison #8
- MemMachine blog: memmachine.ai/blog/2025/12/memmachine-v0.2-delivers-top-scores-and-efficiency-on-locomo-benchmark/
Milestone
v0.19.0
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels