Spotify's in-house vector search, minus the infrastructure headache
A battle-tested HNSW library that ships as a pip install or Maven dependency, not a Kubernetes manifest.

What it does
Voyager is an approximate nearest-neighbor search library for in-memory vectors. It wraps the HNSW algorithm (via hnswlib) in Python and Java bindings that share index formats, so you can build an index in one language and query it in the other. Spotify runs it in production for hundreds of millions of daily queries.
The interesting bit
The pitch is deployability: no server to stand up, no vectors to host in a separate database. The README explicitly compares it to Sparkey (Spotify’s embeddable key-value store) and to Annoy, but claims much higher recall. That framing matters—this is glue code, but it’s glue code that survived Spotify’s scale.
Key highlights
- Python 3.9–3.13 and Java 8+ support, with x86_64 and arm64 builds for macOS and Linux
- Windows supported on x86_64 only; arm64 builds are absent
- Indexes are compatible across Python and Java
- Apache 2.0 licensed, with published docs for both languages
- Version 2.1.0 current as of the README
Caveats
- No Windows arm64 support; unclear if this matters for your use case
- “Numerous features added for convenience and speed” is vague—README doesn’t enumerate them
- In-memory only; not a solution for datasets that exceed RAM
Verdict
Worth a look if you need vector search inside an existing Python or Java process without adding infrastructure. Skip it if you need distributed search, persistent storage, or Windows on ARM.