MMMU-Benchmark/MMMU
Evaluation codebase and leaderboard for the MMMU benchmark assessing multimodal AI models on expert-level reasoning tasks.

Velocity · 7d
+0.6
★ / day
Trend
→steady
star history
MMMU provides evaluation code for benchmarking multimodal language models on massive multi-discipline tasks requiring college-level subject knowledge. The benchmark contains 11.5K multimodal questions spanning 30 subjects across six core disciplines, with diverse image types including charts, diagrams, maps, tables, and chemical structures. It evaluates models on visual question answering and reasoning abilities comparable to human experts.