Local AI memory that actually remembers, with receipts
A pluggable, benchmarked memory layer for LLMs that stores conversation history verbatim and retrieves it without API calls.

What it does
MemPalace is a local-first memory system for AI conversations. It stores your chat history as verbatim text — no summarization, no paraphrasing — and retrieves relevant context via semantic search. The index is organized into wings (people/projects), rooms (topics), and drawers (original content), so you can scope searches instead of drowning in a flat corpus. It ships as a Python CLI with ChromaDB as the default vector backend, and the storage layer is swappable via a clean base interface.
The interesting bit
The project posts honest, reproducible benchmarks — a rarity in this space. It hits 96.6% R@5 on LongMemEval with raw semantic search, no LLM, no API keys, no cloud. A hybrid pipeline with keyword and temporal boosting pushes that to 98.4% on held-out data. The authors deliberately refuse to headline a fake “100%” or compare against projects that measure different things. That restraint is the signal.
Key highlights
- 96.6% recall@5 on LongMemEval (500 questions) with zero LLM calls or API keys
- Pluggable retrieval backend — swap ChromaDB without touching the rest of the system
- Structured indexing (wings/rooms/drawers) instead of flat vector dumps
- 29 MCP tools for reads, writes, knowledge-graph ops, and agent diaries
- Temporal entity-relationship graph with validity windows, backed by SQLite
- Auto-save hooks for Claude Code to prevent 30-day session expiration
- ~300 MB disk for the default multilingual embedding model (100+ languages)
Caveats
- The README warns of active impostor sites distributing malware; verify you’re on the real repo/PyPI/docs domain
- Claude Code hooks require manual wiring — sessions expire in 30 days without them
- Some benchmark numbers (e.g., MemBench 80.3% R@5) show the system isn’t uniformly dominant across all datasets
Verdict
Worth a look if you’re building agentic workflows or just tired of LLMs forgetting everything every new session. Skip it if you need cloud-native multi-user sync out of the box — this is single-machine, opt-in-everything.