← all repositories

MMMU-Benchmark/MMMU

Evaluation codebase and leaderboard for the MMMU benchmark assessing multimodal AI models on expert-level reasoning tasks.

MMMU
Velocity · 7d
+0.6
★ / day
Trend
steady
star history

MMMU provides evaluation code for benchmarking multimodal language models on massive multi-discipline tasks requiring college-level subject knowledge. The benchmark contains 11.5K multimodal questions spanning 30 subjects across six core disciplines, with diverse image types including charts, diagrams, maps, tables, and chemical structures. It evaluates models on visual question answering and reasoning abilities comparable to human experts.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.