A 9M-parameter fish that proves LLMs aren't magic
Train a working transformer from scratch in a Colab notebook—no PhD, no cluster, no black boxes.

What it does
GuppyLM is a deliberately tiny language model—8.7M parameters, six layers, 128-token context—that speaks in the voice of a small, food-obsessed fish. The entire pipeline, from synthetic data generation to tokenizer training to the vanilla transformer architecture, fits in a single Colab notebook and runs on a T4 GPU in about five minutes. There’s also a browser demo that downloads a quantized ONNX model (~10 MB) and runs inference locally via WebAssembly.
The interesting bit
The author strips away every modern optimization—no RoPE, no SwiGLU, no GQA, no system prompt—because at 9M parameters they add complexity without improving quality. The personality is baked directly into the weights through 60K synthetic single-turn conversations across 60 tank-centric topics. It’s a pedagogical bet: if you can see every gear turn, the big models stop feeling like sorcery.
Key highlights
- Complete from-scratch pipeline: data generation, BPE tokenizer (4,096 vocab), training loop with cosine LR and AMP, and inference
- ~16K unique outputs generated from ~60 templates with randomized components (30 tank objects, 17 food types, 25 activities)
- Single-turn design: multi-turn degrades by turn 3–4 due to the tight context window, so the author chose reliability over illusion of memory
- Pre-trained model, dataset, and two ready-to-run Colab notebooks (train + chat) hosted on HuggingFace
- Local chat via
python -m guppylm chatwith optional single-prompt mode
Caveats
- Interactive chat “quickly runs into the 128-token limit, reducing quality”—the author explicitly warns about this
- Single-turn only by design; don’t expect a conversation partner that remembers your name
- Synthetic template data means responses can feel samey; this is a teaching tool, not a product
Verdict
Grab this if you want to finally understand how transformers work by touching every part, or if you need a weekend project to show a junior developer that “AI” is just matrix math and careful data curation. Skip it if you’re looking for a model that actually does useful work—Guppy thinks the meaning of life is food, and that’s roughly the depth on offer.