← all repositories
plasticityai/magnitude

SQLite-backed embeddings that load in 0.7 seconds and sip 18KB RAM

A Python library that turns multi-gigabyte word vectors into lazy-loaded, memory-mapped SQLite databases with out-of-vocabulary smarts.

1.7k stars Python RAG · SearchML Frameworks
magnitude
Velocity · 7d
+0.5
★ / day
Trend
steady
star history

What it does

Magnitude is a Python package and file format (.magnitude) for storing and querying vector embeddings. It converts models from word2vec, GloVe, fastText, and ELMo into SQLite databases with indexes and memory mapping, then wraps them in a Pythonic API. The goal is to be a lighter, faster alternative to Gensim for production use.

The interesting bit

The real trick is treating a 4GB embedding file like a memory-mapped database rather than loading it into RAM. Magnitude uses SQLite with spatial indexing, SIMD instructions, and LRU caching to serve vectors from disk with near-RAM speed. It also handles out-of-vocabulary keys by falling back to character n-gram similarity, which means misspellings and rare words don’t just return zero vectors.

Key highlights

  • Lazy-loads models: initial load time is 0.72s for a 4.21GB file, with only 18KB RAM used at startup
  • Warm single-key queries run in 0.04ms; even streaming over HTTP hits 0.4ms after first access
  • Converts between word2vec, GloVe, fastText, and ELMo formats with a single utility
  • Supports concatenating multiple embedding models and adding POS tag features
  • Published at EMNLP 2018, so the approach has been peer-reviewed

Caveats

  • First most_similar search without a disk cache can take 247 seconds (subsequent queries drop to ~0.24s)
  • Google Colab installation requires a shell script workaround due to dependency conflicts
  • The “Medium” and “Heavy” benchmark columns in the README are blank (marked with ━), so performance claims for those variants are unclear

Verdict

Worth a look if you’re running embedding queries in production and want to stop paying for RAM you don’t need. Less compelling if you’re doing heavy similarity search without the patience to warm the cache first, or if you’re already happy with Gensim’s in-memory performance.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.