Is magnitude open source?

Yes — plasticityai/magnitude is open source, released under the MIT license.

What language is magnitude written in?

plasticityai/magnitude is primarily written in Python.

How popular is magnitude?

plasticityai/magnitude has 1.7k stars on GitHub.

Where can I find magnitude?

plasticityai/magnitude is on GitHub at https://github.com/plasticityai/magnitude.

← all repositories

plasticityai/magnitude

SQLite-backed embeddings that load in 0.7 seconds and sip 18KB RAM

A Python library that turns multi-gigabyte word vectors into lazy-loaded, memory-mapped SQLite databases with out-of-vocabulary smarts.

★1.7k stars Python RAG · Search ML Frameworks

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

Magnitude is a Python package and file format (.magnitude) for storing and querying vector embeddings. It converts models from word2vec, GloVe, fastText, and ELMo into SQLite databases with indexes and memory mapping, then wraps them in a Pythonic API. The goal is to be a lighter, faster alternative to Gensim for production use.

The interesting bit

The real trick is treating a 4GB embedding file like a memory-mapped database rather than loading it into RAM. Magnitude uses SQLite with spatial indexing, SIMD instructions, and LRU caching to serve vectors from disk with near-RAM speed. It also handles out-of-vocabulary keys by falling back to character n-gram similarity, which means misspellings and rare words don’t just return zero vectors.

Key highlights

Lazy-loads models: initial load time is 0.72s for a 4.21GB file, with only 18KB RAM used at startup
Warm single-key queries run in 0.04ms; even streaming over HTTP hits 0.4ms after first access
Converts between word2vec, GloVe, fastText, and ELMo formats with a single utility
Supports concatenating multiple embedding models and adding POS tag features
Published at EMNLP 2018, so the approach has been peer-reviewed

Caveats

First most_similar search without a disk cache can take 247 seconds (subsequent queries drop to ~0.24s)
Google Colab installation requires a shell script workaround due to dependency conflicts
The “Medium” and “Heavy” benchmark columns in the README are blank (marked with ━), so performance claims for those variants are unclear

Verdict

Worth a look if you’re running embedding queries in production and want to stop paying for RAM you don’t need. Less compelling if you’re doing heavy similarity search without the patience to warm the cache first, or if you’re already happy with Gensim’s in-memory performance.

Frequently asked

What is plasticityai/magnitude?: A Python library that turns multi-gigabyte word vectors into lazy-loaded, memory-mapped SQLite databases with out-of-vocabulary smarts.
Is magnitude open source?: Yes — plasticityai/magnitude is open source, released under the MIT license.
What language is magnitude written in?: plasticityai/magnitude is primarily written in Python.
How popular is magnitude?: plasticityai/magnitude has 1.7k stars on GitHub.
Where can I find magnitude?: plasticityai/magnitude is on GitHub at https://github.com/plasticityai/magnitude.