← all repositories

zli12321/Vision-Language-Models-Overview

A curated survey repository tracking the evolution of vision-language models across three architectural eras.

616 stars HTML Language ModelsLearning
Vision-Language-Models-Overview
Velocity · 7d
+1.1
★ / day
Trend
steady
star history

This repository maintains a comprehensive collection and survey of vision-language model papers and implementations. It documents the architectural progression from early frozen-encoder approaches through LLM-centric designs to modern native multimodal transformers. The survey covers benchmarking methodologies, evaluation frameworks, RL alignment techniques, and applications across leading models including GPT-4V, Claude, Gemini, LLaVA, and Qwen-VL variants.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.