lumina-ai-inc/chunkr
A Rust-based document processing service that uses vision-language models for layout analysis, OCR, and semantic chunking to prepare documents for RAG pipelines.

Chunkr is an open-source document intelligence API designed to preprocess complex documents for retrieval-augmented generation systems. It performs layout analysis to identify document structure, applies OCR with bounding box extraction for scanned content, and generates semantically coherent chunks suitable for vector storage and retrieval. The service integrates vision-language model processing to handle complex visual elements within documents.