← all repositories
vearch/vearch

JD's vector database, now open-source and Helm-ready

A distributed vector search engine built for production RAG pipelines, with hybrid search and raft-based replication.

2.3k stars Go RAG · Search
vearch
Velocity · 7d
+0.9
★ / day
Trend
steady
star history

What it does

Vearch stores embedding vectors and runs similarity search at scale, backed by Faiss for the actual indexing. It adds scalar filtering, so you can combine vector similarity with metadata constraints like “only items in stock.” The system splits work across three roles: a Master for schema and coordination, Routers for REST API and request merging, and PartitionServers that replicate data via raft.

The interesting bit

The project originated at JD.com’s e-commerce platform — the academic citation is a 2019 paper on real-time visual search for billions of images. That retail-scale pedigree shows in the architecture: it’s designed as a complete distributed system rather than a single-node library with networking duct-taped on.

Key highlights

  • Hybrid search: vector similarity plus scalar filtering in one query
  • Deploy via Helm chart, Docker Compose (standalone or cluster), or source compile
  • SDKs for Python, Go, Java, and Rust
  • Ready-made integrations for LangChain, LlamaIndex, Langchaingo, and LangChain4j
  • Core engine (Gamma) built on Faiss for vector indexing

Caveats

  • The README claims “millions of objects in milliseconds” but offers no benchmarks, hardware specs, or comparison numbers to ground that
  • The “billions of images” visual search demo requires an unspecified “image retrieval plugin” — not included in the repo

Verdict

Worth evaluating if you need a self-hosted vector database with hybrid search and explicit multi-node replication. Skip it if you’re already committed to a managed service or if your workload fits comfortably in a single-node solution like pgvector.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.