nomic-ai/contrastors
PyTorch-based toolkit for training contrastive embedding models for retrieval, multimodal, and RAG applications.

Contrastors is a contrastive learning library enabling researchers and engineers to train embedding models efficiently. It supports multi-GPU training, large batch sizes via GradCache, and builds on Flash Attention for speed. The toolkit supports CLIP and LiT-style contrastive learning, Matryoshka Representation Learning for flexible embedding dimensions, and multimodal training with ViT models alongside text encoders. It includes pretrained embedding models like Nomic Embed used in production RAG systems.