X-ray vision for your Hugging Face models
A thin wrapper around Captum that makes transformer interpretability actually usable.

What it does
transformers-interpret wraps Meta’s Captum library to expose which tokens (or image regions) drive a Hugging Face model’s predictions. You instantiate an explainer with a model and tokenizer, call it on some input, and get back per-token attribution scores plus optional HTML visualizations. It handles sequence classification, pairwise tasks, multilabel, question answering, NER, zero-shot classification, and computer vision models.
The interesting bit
The library’s real trick is letting you inspect non-predicted classes. Feed it a mixed-sentiment sentence, ask for the “NEGATIVE” attributions, and you can see which words would push the model toward that class even when the final prediction stays “POSITIVE.” For cross-encoders, a flip_sign flag inverts attributions to explain why two inputs scored as dissimilar rather than similar.
Key highlights
- Two-line API: instantiate explainer, call it on text or images
- Built-in
visualize()emits inline notebook HTML or savable files - Supports pairwise tasks (NLI, cross-encoders) with dual-input attribution
- Vision explainers included (heatmaps, overlays, masked views)
- Ships with a Streamlit demo app
Caveats
- README is thorough for text tasks but the vision section is barely sketched; you’ll need to dig into source or notebooks for CV usage
- The “2 lines” claim is technically true for basic cases, but complex tasks (pairwise, flipped signs, non-predicted classes) need more setup
Verdict
Worth a look if you’re already in the Hugging Face ecosystem and need quick, legible explanations without learning Captum’s internals. Skip it if you need model-agnostic interpretability or heavy customization — this is tightly coupled to transformers.