Is OpenAI-CLIP open source?

Yes — moein-shariatnia/OpenAI-CLIP is open source, released under the MIT license.

What language is OpenAI-CLIP written in?

moein-shariatnia/OpenAI-CLIP is primarily written in Jupyter Notebook.

How popular is OpenAI-CLIP?

moein-shariatnia/OpenAI-CLIP has 724 stars on GitHub.

Where can I find OpenAI-CLIP?

moein-shariatnia/OpenAI-CLIP is on GitHub at https://github.com/moein-shariatnia/OpenAI-CLIP.

← all repositories

moein-shariatnia/OpenAI-CLIP

CLIP from scratch: a tutorial that learned from its mistakes

A PyTorch walkthrough of OpenAI's image-text model that recently fixed its own training loop after community bug reports.

★724 stars Jupyter Notebook Image · Video · Audio

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

This repo implements OpenAI’s CLIP model in PyTorch for learning joint image-text embeddings. It pairs a ResNet50 image encoder (via timm) with DistilBERT for text, then trains them contrastively so that matching captions and images pull together in embedding space. The result: feed it “a boy jumping with skateboard” and it retrieves relevant images from a dataset.

The interesting bit

The October 2025 update is the real story. The author discovered that the original implementation had quietly leaked model outputs into its own training targets—letting the network steer its own labels toward trivial collapse. Community bug reports also flagged missing L2 normalization and broken handling of duplicate captions per image. The fix: cosine-normalized embeddings, symmetric cross-entropy built from explicit sample IDs, and a training loop that can no longer game itself. It’s a rare public case of a popular tutorial (~700 stars, cited in ICLR and ICML papers) openly rewriting its core loss after the crowd found the cracks.

Key highlights

Faithful contrastive loss matching the original CLIP paper, now with proper cosine similarity instead of unbounded dot products
Handles multiple captions per image via a sample id mask rather than naive diagonal targets
Built as an educational notebook with runnable Colab link; code is deliberately explicit over optimized
Has been used as a reference in published research (Domino at ICLR 2022, GSCLIP at ICML 2022, and others)
Uses standard stack: timm, HuggingFace Transformers, albumentations, PyTorch

Caveats

The README still carries hardcoded Windows paths (C:/Moein/AI/Datasets/Flicker-8k) in the config class
Training epochs and patience are set quite low (4 epochs, patience 1) for what CLIP typically needs at scale
Image and text encoders are left fully trainable=True by default; fine-tuning behavior versus frozen feature extraction is unexplained

Verdict

Worth a look if you want to understand CLIP’s mechanics by reading clean PyTorch rather than OpenAI’s production code. Skip it if you need a battle-hardened training pipeline or pretrained weights ready to deploy.

Frequently asked

What is moein-shariatnia/OpenAI-CLIP?: A PyTorch walkthrough of OpenAI's image-text model that recently fixed its own training loop after community bug reports.
Is OpenAI-CLIP open source?: Yes — moein-shariatnia/OpenAI-CLIP is open source, released under the MIT license.
What language is OpenAI-CLIP written in?: moein-shariatnia/OpenAI-CLIP is primarily written in Jupyter Notebook.
How popular is OpenAI-CLIP?: moein-shariatnia/OpenAI-CLIP has 724 stars on GitHub.
Where can I find OpenAI-CLIP?: moein-shariatnia/OpenAI-CLIP is on GitHub at https://github.com/moein-shariatnia/OpenAI-CLIP.