← all repositories
spotify/voyager

Spotify's in-house vector search, minus the infrastructure headache

A battle-tested HNSW library that ships as a pip install or Maven dependency, not a Kubernetes manifest.

1.6k stars C++ RAG · Search
voyager
Velocity · 7d
+1.4
★ / day
Trend
steady
star history

What it does

Voyager is an approximate nearest-neighbor search library for in-memory vectors. It wraps the HNSW algorithm (via hnswlib) in Python and Java bindings that share index formats, so you can build an index in one language and query it in the other. Spotify runs it in production for hundreds of millions of daily queries.

The interesting bit

The pitch is deployability: no server to stand up, no vectors to host in a separate database. The README explicitly compares it to Sparkey (Spotify’s embeddable key-value store) and to Annoy, but claims much higher recall. That framing matters—this is glue code, but it’s glue code that survived Spotify’s scale.

Key highlights

  • Python 3.9–3.13 and Java 8+ support, with x86_64 and arm64 builds for macOS and Linux
  • Windows supported on x86_64 only; arm64 builds are absent
  • Indexes are compatible across Python and Java
  • Apache 2.0 licensed, with published docs for both languages
  • Version 2.1.0 current as of the README

Caveats

  • No Windows arm64 support; unclear if this matters for your use case
  • “Numerous features added for convenience and speed” is vague—README doesn’t enumerate them
  • In-memory only; not a solution for datasets that exceed RAM

Verdict

Worth a look if you need vector search inside an existing Python or Java process without adding infrastructure. Skip it if you need distributed search, persistent storage, or Windows on ARM.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.