JDAI-CV/CoTNet
CoTNet is a contextual transformer network that replaces standard convolutions with self-attention building blocks for visual recognition tasks.

CoTNet is a unified self-attention building block that serves as an alternative to standard convolutions in ConvNets. The repository provides official PyTorch implementations of vision backbone models enhanced with contextualized self-attention for tasks including image classification, object detection, instance segmentation, and semantic segmentation. It achieves competitive accuracy with efficient inference time-accuracy trade-offs on ImageNet and MSCOCO benchmarks.