yitu-opensource/T2T-ViT
PyTorch implementation of Tokens-to-Token Vision Transformer (T2T-ViT) for image classification, published at ICCV 2021.

Velocity · 7d
+0.6
★ / day
Trend
→steady
star history
This repository provides code for training Vision Transformers from scratch on ImageNet, introducing a progressive tokenization method to improve ViT performance on small datasets. The implementation builds on PyTorch’s imagenet example and the timm library, supporting GPU training with mixed precision and providing pretrained models.