microsoft/mup
A PyTorch package implementing maximal update parametrization (μP) for stable hyperparameter scaling across neural network widths.

Velocity · 7d
+1.0
★ / day
Trend
→steady
star history
The mup package provides tools for implementing μP in PyTorch models, a technique that stabilizes optimal hyperparameters across different model sizes. It enables reliable hyperparameter transfer from small models to large ones, reducing uncertainty when scaling up neural networks. The research focuses on large pretrained transformers but applies generally to deep learning model scaling.