FareedKhan-dev/train-llm-from-scratch
A guide and codebase for training transformer-based Large Language Models from scratch on a single GPU.

Velocity · 7d
+8.7
★ / day
Trend
→steady
star history
Implements the transformer architecture from the ‘Attention is All You Need’ paper using PyTorch, providing scripts to train LLMs ranging from millions to billions of parameters on a single GPU. Covers data preparation, model architecture construction, training loops, and batch processing.