← all repositories
StarTrail-org/LEANN

A vector database that stores 97% less by not storing vectors

LEANN runs RAG on your laptop by computing embeddings on-demand instead of hoarding them.

11.9k stars Python RAG · SearchLLMOps · Eval
LEANN
Velocity · 7d
+33
★ / day
Trend
steady
star history

What it does

LEANN is a vector database for local, privacy-first RAG. It indexes documents, emails, browser history, chat logs, and even live data via MCP servers—then lets you search and chat with them using local or remote LLMs. The pitch: 60 million text chunks in 6 GB instead of 201 GB, all on your laptop, zero cloud required.

The interesting bit

Instead of storing every embedding, LEANN keeps a pruned graph and recomputes vectors on the fly. It calls this “graph-based selective recomputation with high-degree preserving pruning.” The trade-off is CPU work at query time in exchange for radical storage savings—less hoarding, more thinking.

Key highlights

  • Claims 97% storage reduction vs traditional vector DBs with “no accuracy loss” (per README; paper linked at arXiv:2506.08276)
  • Native MCP integration for live data: Slack, Twitter, and anything else speaking Model Context Protocol
  • Drop-in semantic search MCP for Claude Code, upgrading it from grep to actual retrieval
  • Pre-built connectors for Apple Mail, WeChat, iMessage, ChatGPT/Claude history, Google Search History
  • Supports HNSW and DiskANN backends; builds from source require platform-specific C++ toolchains

Caveats

  • Build-from-source path is involved: macOS needs libomp/boost/protobuf, Linux needs MKL or OpenBLAS, Windows needs Visual Studio + vcpkg
  • Ubuntu 20.04 users may need to manually pin Protobuf/Abseil versions (Issue #30)
  • GPU acceleration is on the roadmap, not shipped; README solicits votes for it

Verdict

Worth a look if you want personal RAG without renting GPUs or leaking data to OpenAI. Skip it if you need production-scale concurrent serving or already have cheap embedding storage.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.