← all repositories

zai-org/CogVLM2

Open-source multi-modal LLM combining vision and language understanding based on Llama3-8B.

CogVLM2
Velocity · 7d
+3.2
★ / day
Trend
steady
star history

CogVLM2 is a GPT4V-level open-source multi-modal model that integrates visual and language capabilities. The model supports image understanding and extends to video comprehension through keyframe extraction, handling videos up to 1 minute. It offers multiple deployment options including TGI inference and INT4 quantized versions requiring only 16GB VRAM.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.