← all repositories

segment-any-text/wtpsplit

A Python toolkit implementing transformer-based models to robustly segment text into sentences across 85 languages.

1.3k stars Python ML Frameworks
wtpsplit
Velocity · 7d
+0.6
★ / day
Trend
steady
star history

wtpsplit provides pretrained deep learning models for sentence boundary detection and semantic unit segmentation. It implements the SaT (Segment Any Text) model as the primary offering, achieving state-of-the-art results across 8 corpora in 85 languages. The models can run on GPU, CPU via ONNX, or TPU, and support batched inference for efficient processing of multiple texts.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.