Is Video-MME open source?

Yes — MME-Benchmarks/Video-MME is an open-source project tracked on heatdrop.

How popular is Video-MME?

MME-Benchmarks/Video-MME has 787 stars on GitHub.

Where can I find Video-MME?

MME-Benchmarks/Video-MME is on GitHub at https://github.com/MME-Benchmarks/Video-MME.

MME-Benchmarks/Video-MME

LLMs can now flunk a 254-hour video comprehension exam

Video-MME stress-tests multi-modal LLMs with 900 human-annotated videos—up to an hour long—to see if they actually understand moving pictures, not just still frames.

★787 stars LLMOps · Eval Data Tooling

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does Video-MME is an evaluation suite that grades multi-modal LLMs on video understanding. It pairs 900 videos—254 hours in total—with 2,700 human-annotated multiple-choice questions. The test spans clips from 11 seconds up to a full hour, and can feed models subtitles and audio alongside frames to probe whether they grasp temporal dynamics or just cherry-pick stills.

The interesting bit Every video and annotation was built from scratch rather than repurposed from existing datasets, which is rarer than it should be in benchmark culture. The work has also been adopted by OpenAI and Google as a reference test for long-context multimodal performance, suggesting the questions are difficult enough that labs want to brag about beating it.

Key highlights

Duration coverage from 11-second shorts to 60-minute longs across 6 domains and 30 subfields.
Forces models to handle subtitles and audio, not just raw pixels.
All 2,700 QA pairs are newly human-annotated; no recycled dataset bias.
Scoring is deterministic and self-contained—no third-party judge models required.
Plugs into existing evaluation stacks like VLMEvalKit and LMMs-Eval.

Caveats

Strictly academic license: commercial use is banned and redistribution requires explicit approval.
Long-video evaluation requires careful alignment of sampled frames with their corresponding subtitle segments.

Verdict Useful for researchers who need a hard, duration-diverse benchmark to stress-test video MLLMs. Not useful if you need training data or a commercially usable dataset.

Frequently asked

What is MME-Benchmarks/Video-MME?: Video-MME stress-tests multi-modal LLMs with 900 human-annotated videos—up to an hour long—to see if they actually understand moving pictures, not just still frames.
Is Video-MME open source?: Yes — MME-Benchmarks/Video-MME is an open-source project tracked on heatdrop.
How popular is Video-MME?: MME-Benchmarks/Video-MME has 787 stars on GitHub.
Where can I find Video-MME?: MME-Benchmarks/Video-MME is on GitHub at https://github.com/MME-Benchmarks/Video-MME.