← all repositories
epfml/attention-cnn

Attention is convolution in a trenchcoat

This ICLR 2020 paper proves self-attention can express any convolutional layer—and shows that trained models often learn to do exactly that.

attention-cnn
Velocity · 7d
+0.4
★ / day
Trend
steady
star history

What it does

This repository holds the code for a paper that asks a blunt question: when vision transformers use self-attention, are they secretly just doing convolutions? The authors prove mathematically that a multi-head self-attention layer with enough heads can represent any convolutional layer. Then they run experiments to check whether trained attention layers actually converge toward convolution-like behavior. Spoiler: they often do.

The interesting bit

The proof is constructive, not just an existence argument. The authors also built an interactive website where you can poke at attention patterns directly, which is rarer than it should be for a math-heavy paper.

Key highlights

  • Formal proof that multi-head self-attention subsumes convolutional layers (given sufficient heads)
  • Empirical validation that learned attention heads learn convolution-like patterns in practice
  • Reproducible experiments via shell scripts in runs/
  • Interactive visualization at epfml.github.io/attention-cnn
  • ICLR 2020; 1,121 stars

Caveats

  • Setup instructions specify CUDA 10.0 and Anaconda, so modern environments may need massaging
  • The repo is research code: expect paper reproduction scripts, not a maintained library

Verdict

Worth a look if you’re trying to understand why vision transformers work—or if you need ammunition for arguments about inductive biases. Skip if you want production-ready attention primitives.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.