OpenGVLab/Ask-Anything
Multi-modal LLM system enabling conversational video understanding through instruction-tuned video-to-language models.

VideoChatGPT, highlighted at CVPR 2024, is a video understanding system that combines LLMs with visual processing for video-question-answering and captioning. It supports multiple foundation models including miniGPT4, StableLM, and MOSS, and provides instruction-tuning capabilities for video and image chatting. The system is built with Gradio for the UI and LangChain for orchestration, offering an end-to-end chatbot for video and image understanding.