← all repositories

MaartenGr/BERTopic

BERTopic is a Python topic modeling library that uses transformer-based embeddings and c-TF-IDF to discover interpretable topics in text corpora.

7.7k stars Python Language ModelsData Tooling
BERTopic
Velocity · 7d
+3.7
★ / day
Trend
steady
star history

BERTopic leverages BERT and sentence embeddings to represent documents semantically, then applies a class-based TF-IDF technique to extract coherent topics from the clusters. It supports guided, supervised, semi-supervised, manual, hierarchical, and probabilistic topic modeling approaches. The library integrates with Hugging Face transformers and various embedding backends including sentence-transformers for flexible document representation.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.