Why your ViT is staring at the wrong part of the cat
A CVPR 2021 method that makes Transformers explain their own classifications with class-specific heatmaps, not just generic attention blobs.

What it does This repo implements a method to visualize why a Transformer model picked a specific class — say, “tabby cat” versus “tiger” — rather than dumping a generic attention map that could apply to anything. It works for both vision (ViT, DeiT) and NLP (BERT) tasks via a three-phase pipeline: relevance scoring with a custom LRP formulation, gradient backprop per class to weight attention heads, then layer aggregation through rollout.
The interesting bit The per-class angle is the hook. Standard attention rollout shows you where the model looked; this shows you where it looked for that specific decision. The authors also note they’ve since moved on to a faster, LRP-free successor — but this remains the reference implementation for the original CVPR paper.
Key highlights
- Ready-to-run Colab notebooks for ViT and BERT sentiment analysis
- Supports class-indexed visualization: pass
class_indexto see evidence for a specific label, or omit it for the top prediction - Includes segmentation and perturbation evaluation scripts for ImageNet
- BERT pipeline outputs
.texfiles with extracted rationales per example - DeiT example notebook added for efficient ViT variants
Caveats
- BERT reproduction requires manual downloads from Google Drive (weights, dataset, pickle file) — no automated setup
- The README pushes the newer “Transformer-MM-Explainability” repo heavily; this version is effectively the legacy branch
- Code structure is research-grade: scattered scripts, hardcoded paths, and
PYTHONPATH=./gymnastics
Verdict Worth a spin if you need interpretability baselines for ViT or BERT and can tolerate 2021-era research code. Skip if you want plug-and-play — the authors themselves point to their newer work for that.