← all repositories

Liuhong99/Sophia

A stochastic second-order optimizer for training large language models, implemented as a drop-in replacement for Adam-style optimizers.

1k stars Python ML FrameworksLanguage Models
Sophia
Velocity · 7d
+0.9
★ / day
Trend
steady
star history

This repository provides the official implementation of the Sophia-G optimizer, a second-order optimization algorithm tailored for language model pre-training. It includes GPT-2 training scripts built on nanoGPT and levanter, integrating with PyTorch and the Hugging Face ecosystem. Sophia uses a diagonal Hessian estimator with a clipped Gaussian update to improve convergence over first-order methods like Adam.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.