mirth/chonky
A Python library that segments text into semantic chunks using a fine-tuned transformer model for RAG pipelines.

Velocity · 7d
+1.0
★ / day
Trend
→steady
star history
Chonky provides intelligent text segmentation into meaningful semantic chunks via a neural approach. It uses a fine-tuned transformer model to determine natural paragraph boundaries, replacing heuristic text-splitting approaches in RAG pipelines. The library offers a simple ParagraphSplitter API that accepts text and yields coherent text chunks optimized for retrieval-augmented generation systems.