Is tomotopy open source?

Yes — bab2min/tomotopy is open source, released under the MIT license.

What language is tomotopy written in?

bab2min/tomotopy is primarily written in C++.

How popular is tomotopy?

bab2min/tomotopy has 597 stars on GitHub.

Where can I find tomotopy?

bab2min/tomotopy is on GitHub at https://github.com/bab2min/tomotopy.

← all repositories

bab2min/tomotopy

A topic-modeling library that actually uses your CPU

Python wrapper around a C++ Gibbs sampler with SIMD vectorization for when you need LDA, HDP, or fourteen other topic models without waiting for gensim to finish.

★597 stars C++ Language Models

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does

tomotopy is a Python extension of tomoto, a C++ topic-modeling library built on Collapsed Gibbs Sampling. It wraps 14+ models—LDA, Hierarchical Dirichlet Process, Correlated Topic Model, Dynamic Topic Model, and others—into a pip install-able package. The API is straightforward: add documents, call train(), inspect topics.

The interesting bit

The speedup comes from SIMD instruction sets (AVX512, AVX2, SSE2), auto-detected at import time. The README shows tomotopy running 200 iterations in less time than gensim’s 10 iterations, with comparable log-likelihood. It’s the rare Python ML library where the C++ underneath isn’t just glue—it’s doing the actual arithmetic in vectorized registers.

Key highlights

14 topic models in one package, from basic LDA to Pachinko Allocation and supervised variants
SIMD acceleration auto-selected at runtime; tp.isa reports what your CPU supports
Model save/load with type safety (loading an HDP file into an LDA class raises an exception)
Built-in web viewer since v0.13.0 for inspecting trained models in a browser
Corpus utilities with transform hooks for mapping metadata between model types

Caveats

Requires compilation from source on non-x86 platforms or older compilers lacking C++14 support
The interactive viewer video in the README is hosted on a private GitHub user-images URL with an expired JWT, so it may not load for most readers
CGS converges more slowly than Variational Bayes in theory; the speed claim is about iteration time, not total convergence time

Verdict

Worth a look if you’re doing topic modeling at scale on x86-64 hardware and want one library that covers most major models. Skip it if you need GPU acceleration, non-x86 deployment, or variational methods specifically.

Frequently asked

What is bab2min/tomotopy?: Python wrapper around a C++ Gibbs sampler with SIMD vectorization for when you need LDA, HDP, or fourteen other topic models without waiting for gensim to finish.
Is tomotopy open source?: Yes — bab2min/tomotopy is open source, released under the MIT license.
What language is tomotopy written in?: bab2min/tomotopy is primarily written in C++.
How popular is tomotopy?: bab2min/tomotopy has 597 stars on GitHub.
Where can I find tomotopy?: bab2min/tomotopy is on GitHub at https://github.com/bab2min/tomotopy.