A vector database that stores 97% less by not storing vectors
LEANN runs RAG on your laptop by computing embeddings on-demand instead of hoarding them.

What it does
LEANN is a vector database for local, privacy-first RAG. It indexes documents, emails, browser history, chat logs, and even live data via MCP servers—then lets you search and chat with them using local or remote LLMs. The pitch: 60 million text chunks in 6 GB instead of 201 GB, all on your laptop, zero cloud required.
The interesting bit
Instead of storing every embedding, LEANN keeps a pruned graph and recomputes vectors on the fly. It calls this “graph-based selective recomputation with high-degree preserving pruning.” The trade-off is CPU work at query time in exchange for radical storage savings—less hoarding, more thinking.
Key highlights
- Claims 97% storage reduction vs traditional vector DBs with “no accuracy loss” (per README; paper linked at arXiv:2506.08276)
- Native MCP integration for live data: Slack, Twitter, and anything else speaking Model Context Protocol
- Drop-in semantic search MCP for Claude Code, upgrading it from grep to actual retrieval
- Pre-built connectors for Apple Mail, WeChat, iMessage, ChatGPT/Claude history, Google Search History
- Supports HNSW and DiskANN backends; builds from source require platform-specific C++ toolchains
Caveats
- Build-from-source path is involved: macOS needs libomp/boost/protobuf, Linux needs MKL or OpenBLAS, Windows needs Visual Studio + vcpkg
- Ubuntu 20.04 users may need to manually pin Protobuf/Abseil versions (Issue #30)
- GPU acceleration is on the roadmap, not shipped; README solicits votes for it
Verdict
Worth a look if you want personal RAG without renting GPUs or leaking data to OpenAI. Skip it if you need production-scale concurrent serving or already have cheap embedding storage.