← all repositories
thuiar/MMSA

A test kitchen for teaching machines to read faces, voices, and text

MMSA corrals 18 sentiment-analysis models into one pip-installable framework so you can stop rewriting boilerplate and start arguing about which fusion architecture actually matters.

1k stars Python ML FrameworksDomain Apps
MMSA
Velocity · 7d
+0.5
★ / day
Trend
steady
star history

What it does

MMSA is a Python framework that trains and benchmarks multimodal sentiment-analysis models. You feed it video clips (or pre-extracted features), and it handles the plumbing for 18 different architectures—everything from 2017’s Tensor Fusion Network to 2023’s ALMT. It supports three datasets out of the box: MOSI, MOSEI, and the Chinese CH-SIMS. You can run it via a one-liner Python API, a command-line tool, or by cloning and hacking the source directly.

The interesting bit

The real value isn’t any single model; it’s the standardization. MMSA forces every architecture into the same feature format and evaluation loop, so you can actually compare TFN against a transformer-based MulT without debugging six different data loaders. They even ship SHA-256 checksums for the pre-extracted feature files, which is the kind of rigor you rarely see in academic code releases.

Key highlights

  • 18 models supported, split cleanly between single-task and multi-task variants (including several from the authors’ own ACL/AAAI papers)
  • Three datasets with pre-extracted BERT text features, audio, and vision features available via Baidu or Google Drive
  • pip install MMSA and go; or clone, edit, and reinstall locally
  • Companion toolkit MMSA-FET for extracting custom multimodal features if you want to move beyond the provided pickles
  • Version 2.0 is PyPI-packaged; a v_1.0 branch remains for those who preferred the old layout

Caveats

  • BBFN is marked “Work in Progress” in the model table
  • The README notes classification labels are deprecated as of v2.0; regression labels are the path forward, though this isn’t explained in detail
  • Re-installing after local edits requires an explicit pip uninstall cycle, which feels clunky

Verdict

Grab this if you’re doing research in multimodal sentiment analysis and need a sane baseline to beat. Skip it if you’re looking for end-to-end video processing—MMSA expects pre-extracted features, not raw pixels and waveforms.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.