piskvorky/gensim
Python library for topic modeling, document indexing, and similarity retrieval with large corpora using word embeddings and vector space methods.

Gensim provides memory-efficient implementations of popular NLP algorithms including Latent Semantic Analysis, Latent Dirichlet Allocation, and Word2Vec for training word embeddings. It supports streaming processing of large corpora and offers intuitive interfaces for plugging in custom data sources and extending with new algorithms. The library targets the natural language processing and information retrieval communities.