illuin-tech/colpali
A library for training and running visual document retrievers using Vision Language Models based on ColBERT-style multi-vector embeddings.

This repository implements ColPali and related ColVision models for document retrieval. It leverages Vision Language Models like PaliGemma to create multi-vector embeddings directly in the visual space, avoiding the need for OCR or text extraction. The models support training and inference for efficient visual document retrieval, with variants including ColQwen2 and ColSmol. It is associated with the ViDoRe benchmark for evaluating retrieval systems.