← all repositories

IBM/CrossViT

CrossViT is a vision transformer model that uses cross-attention across multiple scales for image classification.

419 stars Python Computer VisionML Frameworks
CrossViT
Velocity · 7d
+0.2
★ / day
Trend
steady
star history

This repository provides the official PyTorch implementation of CrossViT, a vision transformer architecture that combines multi-scale features through cross-attention mechanisms for improved image classification on ImageNet. The implementation includes training scripts, pretrained model weights, and supports distributed multi-GPU training.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.