moein-shariatnia/OpenAI-CLIP
A PyTorch implementation of OpenAI CLIP for training a vision-language contrastive model.

Velocity · 7d
+0.4
★ / day
Trend
→steady
star history
This repository provides a clean implementation of the OpenAI CLIP model in PyTorch, reproducing the contrastive pre-training objective that learns to align image and text embeddings in a shared latent space. The implementation includes proper cosine normalization of embeddings, symmetric cross-entropy targets, and support for multi-caption datasets. It can be used to train the model from scratch or adapt it for downstream zero-shot image classification tasks.