← all repositories

microsoft/CvT

A deep learning model combining convolutional layers with Vision Transformer architecture for high-performance image classification.

606 stars Python Computer Vision
CvT
Velocity · 7d
+0.3
★ / day
Trend
steady
star history

Convolutional Vision Transformers (CvT) merge convolutional neural networks with transformer architecture by introducing a hierarchy of transformers with convolutional token embeddings and convolutional transformer block projections. The model achieves state-of-the-art ImageNet classification accuracy with fewer parameters than comparable Vision Transformers and ResNets, and can optionally omit positional encodings for higher resolution inputs.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.