adithya-s-k/VARAG
A vision-first RAG engine that uses vision-language models for multimodal document retrieval and generation.

Velocity · 7d
+0.6
★ / day
Trend
→steady
star history
VARAG implements retrieval-augmented generation with a visual focus, integrating vision-language models to process and index both visual and textual content from documents. It supports techniques including ColPali-based retrieval, OCR through Docling, and vector-based semantic search to enable grounded responses from multimodal sources.