Is Ask-Anything open source?

Yes — OpenGVLab/Ask-Anything is open source, released under the MIT license.

What language is Ask-Anything written in?

OpenGVLab/Ask-Anything is primarily written in Python.

How popular is Ask-Anything?

OpenGVLab/Ask-Anything has 3.3k stars on GitHub.

Where can I find Ask-Anything?

OpenGVLab/Ask-Anything is on GitHub at https://github.com/OpenGVLab/Ask-Anything.

← all repositories

OpenGVLab/Ask-Anything

Making small LLMs actually watch video instead of guessing

It finetunes 7B LLMs to watch and converse about video end-to-end, rather than relying on static captions or external APIs.

★3.3k stars Python Image · Video · Audio Language Models

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does

Ask-Anything hosts VideoChat, a family of models that ingest video and converse about its content. Early versions are essentially prompt-engineering frontends for ChatGPT, StableLM, and MOSS, while VideoChat2 and later are end-to-end 7B multimodal models built on video backbones like UMT and Vicuna. The project also ships MVBench, a CVPR 2024 Highlight benchmark for evaluating video understanding.

The interesting bit

The team released 2 million diverse instruction-tuning examples to align temporal visual features with language models, producing small models that the authors claim lead open-source 7B entries on long-video tests like Video-MME and MLVU.

Key highlights

VideoChat2_HD scores 54.8% on Video-MME, which the authors report is the best result among 7B multimodal LLMs.
The codebase includes both end-to-end chat models and explicit “communication” wrappers for ChatGPT, MOSS, StableLM, and MiniGPT-4.
MVBench, the project’s evaluation suite, was a CVPR 2024 Poster Highlight.
A 2-million-sample instruction dataset is released for training custom variants.
A vllm branch is available for faster inference on VideoChat2.

Caveats

The repository is a crowded archive spanning multiple generations—text-based wrappers, VideoChat1, VideoChat2, and long-video branches—so finding the right entry point takes patience.
The MiniGPT-4 video extension is explicitly described by the authors as a “simple extension” slated for future improvement.
The latest long-video and accuracy efforts (VideoChat-Flash and TPO) have moved to separate repositories.

Verdict

A solid stop for researchers who need a strong 7B video-language baseline with published training data and benchmarks. Less useful if you want a single clean API; this is a research kitchen with overlapping experiments.

Frequently asked

What is OpenGVLab/Ask-Anything?: It finetunes 7B LLMs to watch and converse about video end-to-end, rather than relying on static captions or external APIs.
Is Ask-Anything open source?: Yes — OpenGVLab/Ask-Anything is open source, released under the MIT license.
What language is Ask-Anything written in?: OpenGVLab/Ask-Anything is primarily written in Python.
How popular is Ask-Anything?: OpenGVLab/Ask-Anything has 3.3k stars on GitHub.
Where can I find Ask-Anything?: OpenGVLab/Ask-Anything is on GitHub at https://github.com/OpenGVLab/Ask-Anything.