Is Video-ChatGPT open source?

Yes — mbzuai-oryx/Video-ChatGPT is open source, released under the CC-BY-4.0 license.

What language is Video-ChatGPT written in?

mbzuai-oryx/Video-ChatGPT is primarily written in Python.

How popular is Video-ChatGPT?

mbzuai-oryx/Video-ChatGPT has 1.5k stars on GitHub.

Where can I find Video-ChatGPT?

mbzuai-oryx/Video-ChatGPT is on GitHub at https://github.com/mbzuai-oryx/Video-ChatGPT.

← all repositories

mbzuai-oryx/Video-ChatGPT

Spatiotemporal LLaVA and the benchmarks to back it up

It exists to combine LLMs with a spatiotemporal visual encoder for video conversation, and to introduce a quantitative evaluation framework so video-language models can actually be benchmarked instead of merely demoed.

★1.5k stars Python Chat Assistants Image · Video · Audio

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does Video-ChatGPT adapts the LLaVA recipe to moving images. It pairs a pretrained spatiotemporal video encoder with Vicuna to generate answers, descriptions, and reasoning about video content. The authors trained it on VideoInstruct100K, a dataset of 100,000 video-instruction pairs built with a semi-automatic annotation pipeline.

The interesting bit Most video-language projects ship a demo and call it a day; this one ships a ruler. The authors built what they label the first quantitative evaluation framework for video conversation models, benchmarking correctness, detail orientation, temporal understanding, and consistency. Their tables claim top scores against Video LLaMA, LLaMA Adapter, and Video Chat on MSVD-QA, MSRVTT-QA, TGIF-QA, and ActivityNet-QA.

Key highlights

Architecture inspired by LLaVA, fusing a spatiotemporal visual encoder with an LLM for video-centric dialogue.
Released VideoInstruct100K, a 100,000-pair instruction dataset with a scalable, semi-automatic annotation framework.
Introduces VCGBench-Diverse and other quantitative benchmarks evaluating generative performance and zero-shot QA across 18 video categories.
Accepted at ACL 2024; online and offline demos are available.

Verdict Grab it if you need a reproducible ACL 2024 baseline for video QA or spatiotemporal reasoning. Skip it if you want the latest and greatest—the authors have already moved on to VideoGPT+ and Mobile-VideoGPT.

Frequently asked

What is mbzuai-oryx/Video-ChatGPT?: It exists to combine LLMs with a spatiotemporal visual encoder for video conversation, and to introduce a quantitative evaluation framework so video-language models can actually be benchmarked instead of merely demoed.
Is Video-ChatGPT open source?: Yes — mbzuai-oryx/Video-ChatGPT is open source, released under the CC-BY-4.0 license.
What language is Video-ChatGPT written in?: mbzuai-oryx/Video-ChatGPT is primarily written in Python.
How popular is Video-ChatGPT?: mbzuai-oryx/Video-ChatGPT has 1.5k stars on GitHub.
Where can I find Video-ChatGPT?: mbzuai-oryx/Video-ChatGPT is on GitHub at https://github.com/mbzuai-oryx/Video-ChatGPT.