segment-any-text/wtpsplit
A Python toolkit implementing transformer-based models to robustly segment text into sentences across 85 languages.

Velocity · 7d
+0.6
★ / day
Trend
→steady
star history
wtpsplit provides pretrained deep learning models for sentence boundary detection and semantic unit segmentation. It implements the SaT (Segment Any Text) model as the primary offering, achieving state-of-the-art results across 8 corpora in 85 languages. The models can run on GPU, CPU via ONNX, or TPU, and support batched inference for efficient processing of multiple texts.