← all repositories
raoyongming/GFNet

Vision transformers, but make it signal processing

GFNet replaces self-attention with a learned FFT filter, cutting complexity from quadratic to log-linear while keeping global receptive fields.

511 stars Jupyter Notebook Computer VisionImage · Video · Audio
GFNet
Velocity · 7d
+0.3
★ / day
Trend
steady
star history

What it does

GFNet is an image classification architecture that swaps the self-attention layer in vision transformers for frequency-domain operations. It runs a 2D FFT on spatial features, multiplies by learnable complex-valued “global filters,” then inverse-FFT back. The whole thing is ~20 lines of PyTorch and runs in O(n log n) instead of O(n²).

The interesting bit

The trick is that a single element-wise multiplication in frequency space acts as a global convolution in pixel space—every output location sees every input location, but through the FFT’s bookkeeping, not an explicit pairwise attention matrix. The authors visualize these learned filters and they actually look like structured frequency responses, not random noise.

Key highlights

  • Pretrained ImageNet models from 7M to 54M parameters, top-1 accuracy 74.6%–82.9%
  • Core GlobalFilter layer is 8 lines of PyTorch using torch.fft.rfft2 / irfft2
  • Requires PyTorch ≥1.8.0 for the FFT API; code builds on timm and DeiT
  • Supports fine-tuning at higher resolution (384×384 shown) and transfer learning scripts included
  • MIT licensed

Caveats

  • The FFT assumes fixed input resolutions; the filter dimensions (h=14, w=8) are hardcoded to the feature map size at that layer
  • No training from scratch on modern hardware configs (scripts show 8-GPU distributed launch, no single-GPU recipe)
  • Jupyter Notebook repo language is misleading—it’s PyTorch code with some notebook visualizations

Verdict

Worth a look if you’re building vision models where quadratic attention is a bottleneck, especially at higher resolutions. Skip if you need flexible input sizes or want a drop-in replacement without thinking about frequency-domain shapes.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.