wenhaochai/MovieChat
MovieChat is a multimodal LLM system that handles long video understanding by converting dense tokens to sparse memory for efficient processing on standard GPUs.

MovieChat is a CVPR 2024 research paper and open-source implementation of a video understanding system that combines computer vision and language models. It uses a sparse memory mechanism to efficiently process videos with over 10K frames on a 24GB GPU, dramatically reducing memory overhead compared to traditional approaches. The system builds on LLaMA and includes a leaderboard (MovieChat-1K) for benchmarking long video understanding performance.