FrancescoSaverioZuppichini/ViT
A PyTorch implementation tutorial of Vision Transformer (ViT) for image recognition at scale.

Velocity · 7d
+0.2
★ / day
Trend
→steady
star history
This repository provides a complete implementation of the Vision Transformer (ViT) architecture in PyTorch. It breaks down the model block by block, covering patch embedding, positional encoding, transformer encoder layers with self-attention and residuals, and the classification head. The implementation is structured as an educational tutorial demonstrating how standard transformer mechanisms can be applied to image classification tasks.