Softlandia-Ltd/vision-is-all-you-need
A serverless Vision RAG demo that embeds PDF pages as vectors using ColPali, stores them in Qdrant, and answers queries with GPT4o Vision.

This project demonstrates a Vision RAG (V-RAG) architecture that eliminates traditional text chunking by converting PDF pages to images and embedding them directly using a vision-language model (ColPali). The embeddings are stored in Qdrant for semantic search. On retrieval, the user query and matched page images are passed to GPT4o Vision to generate a final answer. It runs serverless on Modal with a FastAPI backend and React frontend.