← all repositories

jeonsworld/ViT-pytorch

A PyTorch reimplementation of the Vision Transformer model for image classification tasks.

2.2k stars Jupyter Notebook Computer VisionML Frameworks
ViT-pytorch
Velocity · 7d
+1.1
★ / day
Trend
steady
star history

This repository provides a PyTorch reimplementation of the Vision Transformer (ViT) architecture from the paper ‘An Image is Worth 16x16 Words’. The model applies transformer encoders directly to image patches for image recognition at scale. It includes support for loading Google’s official pretrained checkpoints, training on datasets like CIFAR-10 and ImageNet, and implements both pure ViT and hybrid ResNet+ViT variants across multiple model sizes from B-16 to H-14.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.