← all repositories

zai-org/CogVLM

CogVLM is an open-source visual language model with 17B parameters supporting image understanding and multi-turn dialogue.

6.7k stars Python Language ModelsAgents
CogVLM
Velocity · 7d
+6.8
★ / day
Trend
steady
star history

CogVLM is a multimodal pretrained visual language model achieving state-of-the-art on 10 cross-modal benchmarks including captioning, VQA, and referring tasks. CogAgent extends CogVLM with 18B parameters and adds GUI agent capabilities for autonomous screen operation tasks. Both models use a visual expert architecture to align visual and language representations, supporting high-resolution image understanding.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.