Liuhong99/Sophia
A stochastic second-order optimizer for training large language models, implemented as a drop-in replacement for Adam-style optimizers.

Velocity · 7d
+0.9
★ / day
Trend
→steady
star history
This repository provides the official implementation of the Sophia-G optimizer, a second-order optimization algorithm tailored for language model pre-training. It includes GPT-2 training scripts built on nanoGPT and levanter, integrating with PyTorch and the Hugging Face ecosystem. Sophia uses a diagonal Hessian estimator with a clipped Gaussian update to improve convergence over first-order methods like Adam.