← all repositories
pbloem/former

A transformer you can actually read in one sitting

A minimal PyTorch transformer implementation that prioritizes clarity over scale, now archived and moved to Codeberg.

1.1k stars Python ML FrameworksLanguage Models
former
Velocity · 7d
+0.4
★ / day
Trend
steady
star history

What it does

This is a from-scratch transformer in PyTorch, stripped down to the essentials. No abstractions hiding the attention mechanism, no distributed training boilerplate, no 10,000-line files. Just the core architecture: embeddings, multi-head self-attention, feed-forward layers, and positional encoding, wired together plainly enough to trace with a cup of coffee.

The interesting bit

Most educational transformer code either drowns you in framework magic or leaves out the tricky parts. This one sits in a narrow middle ground: complete enough to train, small enough to fit in your head. The archival notice suggests the author kept iterating elsewhere, but the GitHub snapshot remains a readable fossil.

Key highlights

  • Pure PyTorch, no external transformer libraries
  • Self-contained implementation of attention, layer norm, residual connections
  • Explicit enough to modify for experiments or pedagogy
  • 1,098 stars suggest it found its audience
  • Current maintenance lives at codeberg.org/pbm/former

Caveats

  • Repository is explicitly unmaintained on GitHub; latest version elsewhere
  • README is a one-line redirect, so details on training scripts, datasets, or performance are absent from this snapshot

Verdict

Grab this if you’re teaching transformers, debugging your own implementation, or just want to see the algorithm without the infrastructure. Skip it if you need production-scale training, current bug fixes, or documentation beyond the code itself.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.