Yes — MMMU-Benchmark/MMMU is open source, released under the Apache-2.0 license.

What language is MMMU written in?

MMMU-Benchmark/MMMU is primarily written in Python.

MMMU-Benchmark/MMMU has 589 stars on GitHub.

Where can I find MMMU?

MMMU-Benchmark/MMMU is on GitHub at https://github.com/MMMU-Benchmark/MMMU.

MMMU-Benchmark/MMMU

College exams for models that claim to see and think

Evaluation code for a benchmark that tests whether multimodal models can reason through college-level coursework or merely read the text and guess.

★589 stars Python LLMOps · Eval Language Models Computer Vision Data Tooling

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does

MMMU is a benchmark—and this repo its evaluation code—that subjects large multimodal models to college-level questions across six disciplines and 30 subjects. The 11.5K problems use 32 distinct image types, from chemical structures to music sheets, forcing models to combine visual perception with domain expertise rather than relying on OCR and pattern matching. The repository includes evaluation scripts and recently released test-set answers for both the original MMMU and the stricter MMMU-Pro variant.

The interesting bit

MMMU-Pro deliberately removes crutches: it filters out questions answerable without the image, adds more tempting wrong answers, and sometimes embeds the question text inside the image itself so the model must actually “see” to read. While top models hit roughly 56% on MMMU, scores on MMMU-Pro range from about 17% to 27%, suggesting that a lot of current “multimodal” performance is simply advanced text completion wearing glasses.

Key highlights

11.5K questions drawn from real college exams, quizzes, and textbooks spanning 183 subfields
32 heterogeneous image types including diagrams, tables, maps, and chemical structures
Test-set answers recently released for fully local evaluation
MMMU-Pro adds vision-only input and augmented distractors to force genuine multimodal reasoning
Even GPT-4V only hits 56% on the original benchmark; scores on MMMU-Pro are far lower

Verdict

Researchers building or benchmarking large multimodal models should use this to learn whether their systems actually understand college coursework or just exploit textual shortcuts. If you are looking for training data or off-the-shelf model weights, look elsewhere.

Frequently asked

What is MMMU-Benchmark/MMMU?: Evaluation code for a benchmark that tests whether multimodal models can reason through college-level coursework or merely read the text and guess.
Is MMMU open source?: Yes — MMMU-Benchmark/MMMU is open source, released under the Apache-2.0 license.
What language is MMMU written in?: MMMU-Benchmark/MMMU is primarily written in Python.
How popular is MMMU?: MMMU-Benchmark/MMMU has 589 stars on GitHub.
Where can I find MMMU?: MMMU-Benchmark/MMMU is on GitHub at https://github.com/MMMU-Benchmark/MMMU.