KellerJordan/Muon
Muon is a custom optimizer for training the hidden weights of neural networks, designed to work alongside AdamW.

Velocity · 7d
+4.6
★ / day
Trend
→steady
star history
This repository implements the Muon optimizer originally described in public posts. It is specifically designed to optimize hidden weights in neural networks while using AdamW for embeddings, classifier heads, and other parameters. The implementation provides a MuonWithAuxAdam class that allows mixing both optimizers in a single training run with separate parameter groups.