Sleep while your GPU does the research
An AI agent that edits, trains, and evaluates LLM code overnight so you don't have to.

What it does
This repo sets up a single-GPU LLM training loop and hands the keyboard to an AI agent. You write instructions in program.md; the agent edits train.py, runs a 5-minute experiment, checks if validation loss improved, and repeats. The goal is waking up to a log of ~100 overnight experiments and hopefully a better model.
The interesting bit
The human doesn’t touch Python. You program the organization — the program.md “skill” that tells the agent how to experiment — while the agent programs the model. It’s a deliberate inversion: the researcher becomes a meta-researcher, tuning the research process rather than the hyperparameters.
Key highlights
- Three files, total: immutable
prepare.py, agent-editabletrain.py, human-editableprogram.md - Fixed 5-minute wall-clock runs make experiments comparable regardless of what the agent changes (architecture, batch size, model depth)
- Metric is
val_bpb(validation bits per byte), so vocabulary changes don’t skew comparisons - Built on a simplified single-GPU nanochat stack; no distributed training, no config sprawl
- Community forks already exist for MacOS, Windows, AMD, and MLX
Caveats
- Requires an NVIDIA GPU; Karpathy is explicitly unsure about taking on CPU/MPS support himself
- Results aren’t comparable across different hardware platforms due to the fixed-time design
- The default
program.mdis intentionally bare-bones — you’ll need to iterate on it yourself
Verdict
Worth a look if you’re curious about automated experimentation and have a GPU to burn overnight. Skip it if you want production training infrastructure or need to run on non-NVIDIA hardware without community forks.