← all repositories
SamLynnEvans/Transformer

A Transformer you can actually read

A PyTorch seq2seq implementation that prioritizes clarity over state-of-the-art chasing.

1.4k stars Python Language Models
Transformer
Velocity · 7d
+0.5
★ / day
Trend
steady
star history

What it does

Trains a language translator from parallel text files using the original Transformer architecture. Feed it two newline-separated sentence files, pick your source and target languages from SpaCy’s seven supported options, and run train.py. After training, translate.py loads the saved weights and spits out translations.

The interesting bit

The README openly admits this hits 0.39 BLEU against a ~0.42 SOTA—not trying to wow you with benchmarks, but to teach you. The entire repo is essentially a companion to a detailed tutorial, making the code the product as much as the model. That’s rarer than it should be in ML repositories.

Key highlights

  • Pure PyTorch implementation of the original Transformer paper
  • Achieved 0.39 BLEU on Europarl after 4–5 days on a single 8GB GPU
  • One-click FloydHub workspace for cloud training without dependency wrangling
  • Configurable layers, heads, embedding dimensions, dropout, and cosine-annealed restarts
  • Checkpointing by time interval rather than just epoch

Caveats

  • No validation set or validation scores yet—listed explicitly as “still to add”
  • SpaCy tokenizer limits you to seven European languages; no custom tokenizer hook
  • README notes missing functionality for inspecting translations from training/validation sets

Verdict

Worth a look if you’re trying to understand how Transformers work under the hood, or need a hackable baseline for seq2seq experiments. Skip it if you need production-grade translation or broad language coverage out of the box.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.