← all repositories

Softlandia-Ltd/vision-is-all-you-need

A serverless Vision RAG demo that embeds PDF pages as vectors using ColPali, stores them in Qdrant, and answers queries with GPT4o Vision.

vision-is-all-you-need
Velocity · 7d
+0.6
★ / day
Trend
steady
star history

This project demonstrates a Vision RAG (V-RAG) architecture that eliminates traditional text chunking by converting PDF pages to images and embedding them directly using a vision-language model (ColPali). The embeddings are stored in Qdrant for semantic search. On retrieval, the user query and matched page images are passed to GPT4o Vision to generate a final answer. It runs serverless on Modal with a FastAPI backend and React frontend.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.