OpenGVLab/VisionLLM
Multimodal large language model enabling vision-language understanding and generation across hundreds of tasks.

Velocity · 7d
+1.0
★ / day
Trend
→steady
star history
VisionLLM is a series of models that use large language models as open-ended decoders for vision-centric tasks. VisionLLM v2 extends this to a generalist multimodal LLM supporting hundreds of vision-language tasks, covering visual understanding, perception, and generation. It was published at NeurIPS 2023 and 2024.