← all repositories

FareedKhan-dev/train-llm-from-scratch

A guide and codebase for training transformer-based Large Language Models from scratch on a single GPU.

4.4k stars Python Language ModelsML Frameworks
train-llm-from-scratch
Velocity · 7d
+8.7
★ / day
Trend
steady
star history

Implements the transformer architecture from the ‘Attention is All You Need’ paper using PyTorch, providing scripts to train LLMs ranging from millions to billions of parameters on a single GPU. Covers data preparation, model architecture construction, training loops, and batch processing.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.