← all repositories

DAMO-NLP-SG/VideoLLaMA3

VideoLLaMA3 is a multimodal LLM designed to understand images and videos via joint visual-language processing.

1.2k stars Jupyter Notebook Language ModelsImage · Video · Audio
VideoLLaMA3
Velocity · 7d
+2.3
★ / day
Trend
steady
star history

VideoLLaMA3 is a frontier multimodal foundation model from DAMO-NLP-SG that processes both images and videos alongside text for understanding tasks. It extends LLaMA-style language model architecture with visual encoders to enable video comprehension and image understanding. The project provides Hugging Face model checkpoints and interactive demos for both image and video understanding.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.