Is gensim open source?

Yes — piskvorky/gensim is open source, released under the LGPL-2.1 license.

What language is gensim written in?

piskvorky/gensim is primarily written in Python.

How popular is gensim?

piskvorky/gensim has 16.5k stars on GitHub.

Where can I find gensim?

piskvorky/gensim is on GitHub at https://github.com/piskvorky/gensim.

← all repositories

piskvorky/gensim

The NLP workhorse that streams corpora bigger than RAM

Gensim lets you model topics, index documents, and train word2vec on text collections that dwarf your RAM, all from pure Python.

★16.5k stars Python ML Frameworks Language Models RAG · Search

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does

Gensim is a Python library for topic modeling, document indexing, and similarity retrieval over large text corpora. It implements classic unsupervised NLP algorithms—LSA, LDA, Random Projections, HDP, and word2vec—while letting you feed data through a streaming API rather than loading everything into memory. The target audience is the NLP and information-retrieval crowd, though anyone wrestling with too much text and too little RAM can use it.

The interesting bit

The library is pure Python on the surface, but delegates the actual number-crunching to low-level BLAS libraries via NumPy, meaning you get optimized Fortran/C performance with multithreading without writing a line of C. Memory efficiency isn’t an afterthought; it is a central design goal achieved by aggressive use of Python generators and iterators for out-of-core processing.

Key highlights

Handles out-of-core corpora that exceed available RAM through streaming generators
Multicore implementations of LSA, LDA, HDP, and word2vec
Distributed computing support for LSA and LDA across computer clusters
Extensive documentation and Jupyter Notebook tutorials
Stable maintenance mode: battle-tested but no longer accepting new features

Caveats

The project is in stable maintenance mode, so expect bug fixes and documentation patches but not new algorithms or major features
Performance depends heavily on your NumPy/BLAS setup; a slow or missing BLAS library can cost you an order of magnitude in speed

Verdict

Worth a look if you need proven, scalable topic modeling or word embeddings in Python and prefer streaming over sharding. Skip it if you want active feature development; the maintainers have explicitly closed the door on new features.

Frequently asked

What is piskvorky/gensim?: Gensim lets you model topics, index documents, and train word2vec on text collections that dwarf your RAM, all from pure Python.
Is gensim open source?: Yes — piskvorky/gensim is open source, released under the LGPL-2.1 license.
What language is gensim written in?: piskvorky/gensim is primarily written in Python.
How popular is gensim?: piskvorky/gensim has 16.5k stars on GitHub.
Where can I find gensim?: piskvorky/gensim is on GitHub at https://github.com/piskvorky/gensim.