SkalskiP/awesome-foundation-and-multimodal-models
A curated list of papers, code, examples, and tutorials covering foundation and multimodal AI models like CLIP, BLIP, LLaVA, and Segment Anything.

This repository is an awesome-style curated list aggregating research papers, implementations, tutorials, and resources on foundation models and multimodal AI systems. It covers models that process multiple modalities including vision, language, and audio. Each entry links to official papers, GitHub code repositories, videos, and interactive demos. The list is auto-generated and community-maintained via a contributions file.