Meta's brain simulator: feed it a movie, get back fMRI
A pretrained Transformer that predicts where blood will flow in your cortex when you watch, hear, or read something.

What it does
TRIBE v2 is a multimodal encoding model that takes video, audio, or text and predicts fMRI brain responses — specifically which vertices on the cortical surface will light up. It bundles pretrained vision, audio, and language models into one Transformer, then maps their joint representations onto a standard brain mesh (fsaverage5, ~20k vertices). The output is offset by 5 seconds to account for the sluggishness of real hemodynamics.
The interesting bit
The “average subject” abstraction is quietly audacious: instead of modeling individual brains, the model predicts a population-average response, which lets it generalize across studies and stimuli without per-subject calibration. The text-to-speech-to-transcription pipeline for word-level timing is also a neat hack — it turns any text file into something the model can temporally align.
Key highlights
- One-liner inference:
TribeModel.from_pretrained("facebook/tribev2")then.predict()on video/audio/text - Handles the full pipeline: word extraction, chunking, event transforms, and surface projection
- Training code included with Slurm grid-search scripts for cortical and subcortical runs
- Brain visualization via PyVista or Nilearn (optional install)
- Colab demo with actual brain plots; weights hosted on HuggingFace
Caveats
- CC-BY-NC-4.0 license — commercial use is off-limits
- Predictions are for an “average” subject; individual brain variation is explicitly not modeled here
- README doesn’t specify hardware requirements or inference latency
Verdict
Neuroimaging researchers and cognitive scientists who want to synthesize or interpret fMRI data should grab this. If you’re looking for real-time BCI or personalized brain decoding, this isn’t that tool — it’s a simulation layer, not an interface.