Is nanochat open source?

Yes — karpathy/nanochat is open source, released under the MIT license.

What language is nanochat written in?

karpathy/nanochat is primarily written in Python.

How popular is nanochat?

karpathy/nanochat has 56.5k stars on GitHub and is currently accelerating.

Where can I find nanochat?

karpathy/nanochat is on GitHub at https://github.com/karpathy/nanochat.

← all repositories

karpathy/nanochat

The GPT-2 speedrun: from $43K to $48 in seven years

nanochat is a minimal, hackable harness that lets you train and chat with a GPT-2-class LLM on a single GPU node for under $100—no hyperparameter spreadsheets required.

★56.5k stars Python Language Models ML Frameworks Inference · Serving

View on GitHub ↗

Velocity · 7d

+35

★ / day

Trend

↗accelerating

star history

What it does nanochat is an end-to-end LLM training harness—tokenization, pretraining, fine-tuning, evaluation, inference, and a ChatGPT-style web UI—built to run on a single GPU node with vanilla PyTorch. It will train a GPT-2-capable model, a task that cost roughly $43,000 in 2019, for about $48 in under two hours on eight H100s, or closer to $15 on a spot instance. When the run finishes, you can open a browser and talk to your model.

The interesting bit The whole system is controlled by one hyperparameter: --depth, the number of transformer layers. That single integer automatically sets width, attention heads, learning rate, training horizon, weight decay, and the rest to keep the model compute-optimal, so you can sweep sizes without tuning a spreadsheet. The project also keeps a public “Time-to-GPT-2” leaderboard to track how fast the community can beat the original GPT-2 CORE score.

Key highlights

Trains a GPT-2-class model (around depth 24–26) for roughly $48 on an 8×H100 node, or about $15 on spot instances.
Replaces PyTorch’s torch.amp.autocast with an explicit global COMPUTE_DTYPE that auto-detects hardware capabilities—bfloat16 on SM 80+, float32 elsewhere.
Falls back to single-GPU training automatically via gradient accumulation, though it takes eight times as long.
Covers the full lifecycle: tokenization, pretraining, SFT, RL, evaluation, and a web chat UI.
Development is leaderboard-driven: the reference runs/speedrun.sh always reflects the current fastest known recipe to GPT-2 capability.

Caveats

GPUs with less than 80 GB of VRAM need their batch size manually reduced or they will OOM; below batch size 1, you are on your own.
float16 training does not yet support the RL stage, and non-CUDA backends such as MPS or XPU are largely untested.
CPU and Apple Silicon runs are possible but dramatically shrink the model and yield weak results.

Verdict Researchers and tinkerers who want a hackable, single-node LLM laboratory with automatic compute-optimal scaling will love this. If you need production-grade multi-node orchestration or polished framework abstractions, this is not your tool.

Frequently asked

What is karpathy/nanochat?: nanochat is a minimal, hackable harness that lets you train and chat with a GPT-2-class LLM on a single GPU node for under $100—no hyperparameter spreadsheets required.
Is nanochat open source?: Yes — karpathy/nanochat is open source, released under the MIT license.
What language is nanochat written in?: karpathy/nanochat is primarily written in Python.
How popular is nanochat?: karpathy/nanochat has 56.5k stars on GitHub and is currently accelerating.
Where can I find nanochat?: karpathy/nanochat is on GitHub at https://github.com/karpathy/nanochat.