NVlabs/GCVit
Global Context Vision Transformer (GC ViT) is a PyTorch vision transformer model for image classification, object detection, and semantic segmentation.

Velocity · 7d
+0.3
★ / day
Trend
→steady
star history
GC ViT introduces global context attention mechanisms to vision transformers, enabling efficient capture of long-range dependencies across images. The model achieves competitive performance on ImageNet classification, COCO object detection, and ADE20K semantic segmentation benchmarks. It provides pretrained checkpoints and training code as an official NVIDIA implementation.