← all repositories

Haiyang-W/TokenFormer

A transformer architecture that tokenizes model parameters and uses attention for both input tokens and learned parameter tokens to enhance scaling flexibility.

590 stars Python Language ModelsML Frameworks
TokenFormer
Velocity · 7d
+1.0
★ / day
Trend
steady
star history

TokenFormer replaces fixed linear projection layers with a parameter-efficient attention mechanism that treats model parameters as learnable tokens. This allows the same architecture to scale computations and model parameters jointly through a unified attention framework. The implementation supports training from scratch and provides pre-trained checkpoints for foundation model applications.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.