← all repositories
rasbt/LLMs-from-scratch

96K stars for building GPT with zero black boxes

A step-by-step PyTorch walkthrough that trains a small-but-real LLM on ordinary laptops, no external libraries allowed.

96.8k stars Jupyter Notebook Language ModelsML FrameworksLearning
LLMs-from-scratch
Velocity · 7d
+92
★ / day
Trend
steady
star history

What it does This repo is the companion code for Sebastian Raschka’s book Build a Large Language Model (From Scratch). It walks through implementing a GPT-like transformer in pure PyTorch—tokenizers, multi-head attention, the full model, pretraining, and finetuning for classification and instruction-following. The code is designed to run on conventional laptops without specialized hardware, though it will use a GPU if one is available.

The interesting bit The constraint is the pedagogy: no Hugging Face, no transformers library, no calling AutoModel.from_pretrained and pretending you understand it. You write every layer yourself, then optionally load weights from larger pretrained models for finetuning. It’s the software equivalent of building a radio from discrete components before you ever touch a smartphone.

Key highlights

  • Seven chapters of Jupyter notebooks plus Python summaries, each with exercise solutions
  • Covers pretraining on unlabeled data, text-classification finetuning, and instruction tuning (including evaluation via Ollama)
  • Appendices on PyTorch basics, distributed data parallel, training-loop improvements, and LoRA
  • CI tested across Linux, Windows, and macOS
  • Companion 17-hour video course and a 170-page self-test PDF available

Caveats

  • The README is upfront that this is an educational model; don’t expect competitive performance out of the small trained-from-scratch version
  • The sequel repo (Build A Reasoning Model) is already teased, so the “from scratch” journey apparently never ends

Verdict Ideal for developers who want to stop treating LLMs as magic APIs and actually understand attention, layer norms, and loss curves. Skip it if you just need to ship a RAG app by Friday—you’ll be faster with off-the-shelf tools.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.