MME-Benchmarks/Video-MME
Comprehensive benchmark for evaluating multi-modal LLMs' video analysis capabilities across 9 domains with 3,000 human-labeled questions.

Video-MME is the first comprehensive evaluation benchmark for assessing multi-modal LLMs on video understanding tasks. The benchmark covers 9 domains of video analysis including long-video comprehension and temporal reasoning, with 3,000 carefully curated questions with human annotations. It serves as a standardized benchmark for comparing multi-modal LLM performance in video understanding, with notable adoption by major AI labs for model evaluation.