A pygmy hippo that chunks text at 100 GB/s
Chonkie wraps every text-splitting strategy you keep rewriting into one install-what-you-need Python library.

What it does Chonkie is a Python chunking toolkit for RAG pipelines. It bundles nine chunkers—from naive token splitting to LLM-based “Slumber” chunking—plus refineries, vector-DB handshakes, and a self-hosted REST API. The default install is 505 KB; extras are opt-in so you don’t drag in half of PyTorch just to split a README.
The interesting bit
The Pipeline class lets you chain chunkers and refineries declaratively—recursive chunk at 2K tokens, semantic chunk at 512, add overlap, embed, and ship to Qdrant in a fluent API. Pipelines are also storable and reusable via the REST API’s SQLite-backed registry, which turns a Python library into a chunking microservice with chonkie serve.
Key highlights
- FastChunker claims SIMD-accelerated, byte-based chunking at “100+ GB/s” on CPU
- 32+ integrations including 8 vector DB handshakes (Chroma, Pinecone, pgvector, etc.) and multiple tokenizer backends
- Optional installs per component—
chonkie[semantic]for embeddings,chonkie[tiktoken]for OpenAI token counting, etc. - Self-hosted API with Docker Compose support and interactive
/docs - 56-language support out of the box
Caveats
- The “100+ GB/s” claim for FastChunker lacks reproducible benchmark details in the README; treat as a marketing figure until verified
- The README is truncated mid-sentence in the transformers tokenizer section, so full tokenizer coverage is unclear
chonkie[all]is explicitly “not recommended for production environments”
Verdict Worth a look if you’re maintaining yet another bespoke chunking script and want one library with swap-in strategies. Skip it if you already have a deeply customized NLP pipeline that you trust—Chonkie is glue, not magic, and the mascot knows it.