Is Sophia open source?

Yes — Liuhong99/Sophia is open source, released under the MIT license.

What language is Sophia written in?

Liuhong99/Sophia is primarily written in Python.

How popular is Sophia?

Liuhong99/Sophia has 1k stars on GitHub.

Where can I find Sophia?

Liuhong99/Sophia is on GitHub at https://github.com/Liuhong99/Sophia.

← all repositories

Liuhong99/Sophia

A second-order optimizer brave enough to pre-train GPT-2

Sophia estimates diagonal Hessians via cheap sampling and clipped updates to push LLM pre-training faster than first-order rivals.

★1k stars Python ML Frameworks Language Models

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

Sophia-G is a stochastic second-order optimizer built for language model pre-training. Every k steps it runs an extra forward-backward pass with labels sampled from the model’s own predictions to update a diagonal Hessian EMA. The parameter update clips first-order momentum by that curvature estimate, which the authors say permits learning rates comparable to AdamW and several times larger than Lion’s.

The interesting bit

Instead of treating second-order information as prohibitively expensive, Sophia treats it as a cheap periodic diagnostic. The README tracks train/win_rate—the fraction of parameters where the update isn’t clipped—to guide hyperparameter tuning, turning a black-box optimizer knob into a visible telemetry signal.

Key highlights

Implements the Sophia-G update rule with per-coordinate clipping against a diagonal Hessian estimate
Ships with nanoGPT-based training scripts for GPT-2 Small (125M), Medium (355M), and 1.5B via levanter
Hyperparameters (except learning rate) are claimed to transfer across model sizes
Requires serious hardware: 10× A5000 or 8× A100 for the smaller models, TPUs for 1.5B
Tuning centers on the clipping threshold ρ, which is adjusted until win_rate stays in the 0.1–0.5 range

Caveats

Hyperparameter tuning is involved and non-standard; you must monitor win_rate to keep ρ in range, and the learning-rate scale differs from both AdamW and Lion
The 1.5B reproduction requires editing levanter’s optim.py source and launching via gcloud TPU VMs
The README notes results were updated for the latest PyTorch version, implying earlier published curves may have shifted

Verdict

Worth a look if you’re pre-training transformers at scale and have the GPU/TPU budget to experiment. If you’re fine-tuning 7B models on a single A100 or looking for a drop-in AdamW replacement, this is not your tool.

Frequently asked

What is Liuhong99/Sophia?: Sophia estimates diagonal Hessians via cheap sampling and clipped updates to push LLM pre-training faster than first-order rivals.
Is Sophia open source?: Yes — Liuhong99/Sophia is open source, released under the MIT license.
What language is Sophia written in?: Liuhong99/Sophia is primarily written in Python.
How popular is Sophia?: Liuhong99/Sophia has 1k stars on GitHub.
Where can I find Sophia?: Liuhong99/Sophia is on GitHub at https://github.com/Liuhong99/Sophia.