← all repositories

gokayfem/awesome-vlm-architectures

A curated collection documenting the architectures of famous Vision-Language Models including LLaVA, PaliGemma, and Janus-Pro.

1.3k stars Markdown Language ModelsLearning
awesome-vlm-architectures
Velocity · 7d
+1.5
★ / day
Trend
steady
star history

This repository compiles detailed information on prominent Vision-Language Models, documenting their multimodal architectures, training procedures, and datasets used for pre-training and fine-tuning. It covers encoder fusion techniques, cross-attention mechanisms, and VLM families designed for visual understanding tasks like Visual Question Answering and image captioning. The collection serves as a reference resource for researchers and developers exploring VLM architecture patterns.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.