monatis/clip.cpp

A dependency-free C/C++ implementation of OpenAI's CLIP model for vision-language tasks.

★559 stars C++ Computer Vision Inference · Serving

View on GitHub ↗

Velocity · 7d

+0.5

★ / day

Trend

→steady

star history

This repository provides a standalone inference implementation of CLIP (Contrastive Language-Image Pre-Training) built on GGML. It supports loading CLIP models from OpenAI and LAION in Transformers format, with options for text-only, vision-only, or full two-tower inference. The implementation includes 4-bit, 5-bit, and 8-bit quantization support, reducing model size to 85.6 MB for the quantized variant. It offers Python bindings without requiring heavyweight ML frameworks like PyTorch or TensorFlow.