← all repositories

OpenGVLab/VisionLLM

Multimodal large language model enabling vision-language understanding and generation across hundreds of tasks.

VisionLLM
Velocity · 7d
+1.0
★ / day
Trend
steady
star history

VisionLLM is a series of models that use large language models as open-ended decoders for vision-centric tasks. VisionLLM v2 extends this to a generalist multimodal LLM supporting hundreds of vision-language tasks, covering visual understanding, perception, and generation. It was published at NeurIPS 2023 and 2024.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.