Is T2T-ViT open source?

Yes — yitu-opensource/T2T-ViT is an open-source project tracked on heatdrop.

What language is T2T-ViT written in?

yitu-opensource/T2T-ViT is primarily written in Jupyter Notebook.

How popular is T2T-ViT?

yitu-opensource/T2T-ViT has 1.2k stars on GitHub.

Where can I find T2T-ViT?

yitu-opensource/T2T-ViT is on GitHub at https://github.com/yitu-opensource/T2T-ViT.

← all repositories

yitu-opensource/T2T-ViT

Vision Transformers that train from scratch on ImageNet alone

Progressively tokenizes images to preserve local structure, letting ViT-class models reach 81.5% top-1 on ImageNet without external pretraining.

★1.2k stars Jupyter Notebook Computer Vision

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does T2T-ViT provides a Vision Transformer recipe that trains from scratch on ImageNet, reaching 81.5% top-1 accuracy with 21.5M parameters. The repo includes pretrained checkpoints, training scripts for 4- or 8-GPU machines, and transfer-learning weights for CIFAR-10/100. It is built on PyTorch and timm.

The interesting bit The architecture replaces the standard patch embedding with a Tokens-to-Token module that can use either a Performer or a vanilla Transformer to progressively structure the input. This local-to-global approach is what allows the model to learn from scratch on ImageNet instead of relying on larger external datasets.

Key highlights

T2T-ViT-14 achieves 81.5% top-1 accuracy; T2T-ViT-24 with Token Labeling reaches 84.2%
Lite variants scale down to 4.3M parameters, explicitly compared with MobileNets
Pretrained weights support variable image sizes via position-embedding interpolation
Includes Jupyter notebooks for attention-map and feature visualization
Transfer-learning setup provided for CIFAR-10 and CIFAR-100

Caveats

Automatic Mixed Precision can cause NaN loss on specific hardware such as Tesla T4
Four-GPU training yields accuracy roughly 0.1–0.3% lower than eight-GPU training

Verdict Useful if you need a drop-in ViT backbone trained on standard ImageNet. Less compelling if you are already running heavily optimized CNNs or newer hybrid architectures.

Frequently asked

What is yitu-opensource/T2T-ViT?: Progressively tokenizes images to preserve local structure, letting ViT-class models reach 81.5% top-1 on ImageNet without external pretraining.
Is T2T-ViT open source?: Yes — yitu-opensource/T2T-ViT is an open-source project tracked on heatdrop.
What language is T2T-ViT written in?: yitu-opensource/T2T-ViT is primarily written in Jupyter Notebook.
How popular is T2T-ViT?: yitu-opensource/T2T-ViT has 1.2k stars on GitHub.
Where can I find T2T-ViT?: yitu-opensource/T2T-ViT is on GitHub at https://github.com/yitu-opensource/T2T-ViT.