facebookresearch/ToMe
Token Merging (ToMe) is a PyTorch implementation that accelerates Vision Transformers by merging similar tokens for 2-3x faster evaluation without retraining.

Velocity · 7d
+0.9
★ / day
Trend
→steady
star history
ToMe provides a drop-in optimization for existing Vision Transformer architectures that merges redundant tokens based on similarity, effectively reducing computation while preserving accuracy. The method can be applied to pretrained ViT models without additional training and can further improve results when used during training. It also has extensions for diffusion models (ToMe-SD).