← all repositories

isaacus-dev/semchunk

A Python library that splits text into semantically coherent chunks optimized for retrieval-augmented generation pipelines.

642 stars Python RAG · SearchData Tooling
semchunk
Velocity · 7d
+0.7
★ / day
Trend
steady
star history

The library provides hierarchical text chunking that preserves semantic context within chunks for better downstream LLM retrieval. It supports AI-powered semantic analysis, configurable chunk overlap, and works with any tokenizer including Tiktoken and Hugging Face Transformers. Benchmarks claim 15% better RAG performance than competing chunking strategies, and it is integrated into production tooling including Microsoft Intelligence Toolkit.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.