← all repositories
pliang279/MultiBench

One benchmark to fuse them all: 10 modalities, 20 tasks, 1 codebase

MultiBench tries to stop every multimodal paper from reinventing its own data loader and calling it a contribution.

628 stars HTML LLMOps · EvalData Tooling
MultiBench
Velocity · 7d
+0.3
★ / day
Trend
steady
star history

What it does

MultiBench is a standardized benchmarking suite and code framework for multimodal deep learning. It bundles 15 datasets across affective computing, healthcare, robotics, finance, and HCI into a single automated pipeline with consistent data loading, training, and evaluation. The companion “MultiZoo” module provides 20 implemented algorithms—unimodal baselines, fusion paradigms (early, late, tensor-based), and training structures—designed to be composed and extended.

The interesting bit

The project explicitly targets three problems the README claims are understudied: generalization across domains, training/inference complexity, and robustness to missing or noisy modalities. Most benchmark papers measure accuracy and stop; MultiBench at least attempts to measure the boring stuff that actually matters in production.

Key highlights

  • 15 datasets, 10 modalities (video, audio, text, time-series, tactile, etc.), 20 prediction tasks
  • Modular algorithm zoo: swap fusion methods, objective functions, or training structures without rewriting scaffolding
  • Automated pipeline standardizes preprocessing, experimental setup, and evaluation
  • Extensible: add a dataset by writing one get_dataloader function; add an algorithm by dropping a module in unimodals/, fusions/, objective_functions/, or training_structures/
  • Published at NeurIPS 2021 Datasets and Benchmarks; companion software paper in JMLR 2022

Caveats

  • Several datasets require manual credentialing or Google Drive downloads (MIMIC, MuJoCo Push) and the README notes automatic downloads “may fail for various reasons”
  • The “robustness to noisy and missing modalities” is claimed as a design goal; the actual robustness evaluation details are not visible in the truncated README

Verdict

Worth a look if you’re doing multimodal research and tired of writing boilerplate. Skip it if you only care about one modality—this is explicitly built for cross-domain comparison, not single-task speed.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.