Is Transformer-MM-Explainability open source?

Yes — hila-chefer/Transformer-MM-Explainability is open source, released under the MIT license.

What language is Transformer-MM-Explainability written in?

hila-chefer/Transformer-MM-Explainability is primarily written in Jupyter Notebook.

How popular is Transformer-MM-Explainability?

hila-chefer/Transformer-MM-Explainability has 911 stars on GitHub.

Where can I find Transformer-MM-Explainability?

hila-chefer/Transformer-MM-Explainability is on GitHub at https://github.com/hila-chefer/Transformer-MM-Explainability.

← all repositories

hila-chefer/Transformer-MM-Explainability

X-ray vision for multimodal transformers

A single method to visualize attention in any bi-modal or encoder-decoder Transformer, no retraining required.

★911 stars Jupyter Notebook LLMOps · Eval Computer Vision

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

This is the official PyTorch implementation of an ICCV 2021 Oral paper. It produces attention-based explanations for Transformer models that handle multiple input types—vision+language (VQA), image+text matching (CLIP), or encoder-decoder setups like DETR. The repo ships with ready-to-run Colab notebooks for LXMERT, DETR, CLIP, and plain ViT, plus scripts to reproduce the paper’s perturbation experiments.

The interesting bit

The method is generic—the same core approach works across architectures without architecture-specific retraining. That’s unusual in explainability, where techniques tend to be model-specific hacks. The authors achieve this by operating on the attention mechanism itself, making it applicable anywhere standard multi-head attention lives.

Key highlights

Ready Colab notebooks for LXMERT, DETR, CLIP, and ViT (GPU required)
Reproduction scripts for VisualBERT, LXMERT, and DETR with exact command-line incantations
Hugging Face Spaces demo for CLIP grounding (built by external contributors)
Perturbation-based evaluation protocol included, not just pretty heatmaps
Works on pretrained models as-is; no fine-tuning for interpretability

Caveats

Reproduction setup is involved: you need to manually patch cocoeval.py for DETR and wrangle multiple dataset downloads
The README warns that requirement installation “may take some time” and requires a runtime restart
VisualBERT depends on the somewhat heavy MMF framework

Verdict

Worth a look if you’re building or auditing multimodal systems and need more than “trust us, it works.” Skip if you just want a drop-in .explain() method—this is research code with research-code edges.

Frequently asked

What is hila-chefer/Transformer-MM-Explainability?: A single method to visualize attention in any bi-modal or encoder-decoder Transformer, no retraining required.
Is Transformer-MM-Explainability open source?: Yes — hila-chefer/Transformer-MM-Explainability is open source, released under the MIT license.
What language is Transformer-MM-Explainability written in?: hila-chefer/Transformer-MM-Explainability is primarily written in Jupyter Notebook.
How popular is Transformer-MM-Explainability?: hila-chefer/Transformer-MM-Explainability has 911 stars on GitHub.
Where can I find Transformer-MM-Explainability?: hila-chefer/Transformer-MM-Explainability is on GitHub at https://github.com/hila-chefer/Transformer-MM-Explainability.