mlfoundations/open_clip
Open-source PyTorch implementation of CLIP, a contrastive vision-language foundation model enabling zero-shot image classification and image-text matching.

OpenCLIP provides an open-source implementation of CLIP (Contrastive Language-Image Pre-training), a multi-modal model that learns to associate images with natural language descriptions through contrastive learning. The library includes training infrastructure with FSDP2 support, NaFlex image pipelines, CLAP audio model integration, and torch.compile strategies. It offers pretrained image/text models for inference and supports zero-shot classification tasks.