monatis/clip.cpp
A dependency-free C/C++ implementation of OpenAI's CLIP model for vision-language tasks.

This repository provides a standalone inference implementation of CLIP (Contrastive Language-Image Pre-Training) built on GGML. It supports loading CLIP models from OpenAI and LAION in Transformers format, with options for text-only, vision-only, or full two-tower inference. The implementation includes 4-bit, 5-bit, and 8-bit quantization support, reducing model size to 85.6 MB for the quantized variant. It offers Python bindings without requiring heavyweight ML frameworks like PyTorch or TensorFlow.