← all repositories

PKU-YuanGroup/Chat-UniVi

A unified visual representation model that enables large language models to understand both images and video content.

Chat-UniVi
Velocity · 7d
+1.0
★ / day
Trend
steady
star history

Chat-UniVi is a vision-language model that empowers large language models with unified visual understanding capabilities for both images and video. The project proposes a unified visual representation that handles images and video through dynamic token allocation across different resolutions. This allows the model to effectively process multiple video frames while maintaining fine-grained image understanding. Published as a CVPR 2024 Highlight paper.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.