SHI-Labs/Neighborhood-Attention-Transformer
A transformer architecture for computer vision that uses localized attention mechanisms, published at CVPR 2023.

Neighborhood Attention Transformer (NAT) and its dilated variant (DNAT) are vision transformer architectures that replace global self-attention with localized neighborhood attention for improved efficiency. The models achieve state-of-the-art performance on multiple computer vision benchmarks including instance segmentation, semantic segmentation, and panoptic segmentation on ADE20K, Cityscapes, and COCO datasets. The implementation includes PyTorch models and a custom CUDA extension (NATTEN) for accelerated neighborhood attention computation.