← all repositories

KellerJordan/modded-nanogpt

NanoGPT (124M) training speedrun achieving target cross-entropy loss on FineWeb in under 90 seconds using 8 NVIDIA H100 GPUs.

5.3k stars Python Language ModelsML Frameworks
modded-nanogpt
Velocity · 7d
+7.3
★ / day
Trend
steady
star history

This repository contains a collaborative speedrun to train a 124M parameter NanoGPT model to achieve 3.28 cross-entropy loss on the FineWeb validation dataset as quickly as possible. The project builds on Karpathy’s llm.c GPT-2 replication and incorporates modern training techniques including the Muon optimizer, FP8 matmul with asymmetric rescaling, Flash Attention 3 with long-short sliding window patterns, and architectural enhancements like rotary embeddings and skip connections. The goal is benchmarking training efficiency and algorithm optimization for language model training.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.