← all repositories

mirth/chonky

A Python library that segments text into semantic chunks using a fine-tuned transformer model for RAG pipelines.

410 stars Python RAG · SearchData Tooling
chonky
Velocity · 7d
+1.0
★ / day
Trend
steady
star history

Chonky provides intelligent text segmentation into meaningful semantic chunks via a neural approach. It uses a fine-tuned transformer model to determine natural paragraph boundaries, replacing heuristic text-splitting approaches in RAG pipelines. The library offers a simple ParagraphSplitter API that accepts text and yields coherent text chunks optimized for retrieval-augmented generation systems.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.