lucidrains/bottleneck-transformer-pytorch
A PyTorch implementation of the Bottleneck Transformer architecture for visual recognition tasks, combining convolutional layers with self-attention mechanisms.

This repository provides a PyTorch implementation of the Bottleneck Transformer (BoTNet), a vision transformer architecture that replaces spatial convolutions in ResNet blocks with multi-head self-attention. The model achieves better performance-compute trade-off than EfficientNet and DeiT on image classification tasks. It includes a BottleStack layer that can be integrated with ResNet backbones to create BotNet models, supporting configurable dimensions, heads, and relative positional embeddings.