← all repositories
lsdefine/attention-is-all-you-need-keras

Transformer in Keras: a 2017 paper, ported with duct tape

A straightforward Keras+TensorFlow reimplementation of "Attention Is All You Need" for developers who want readable, hackable transformer code rather than a framework.

719 stars Python Language ModelsML Frameworks
attention-is-all-you-need-keras
Velocity · 7d
+0.2
★ / day
Trend
steady
star history

What it does

Implements the original “Attention Is All You Need” transformer from scratch in Keras and TensorFlow. Ships with two working examples: English-to-German translation on WMT'16 Multi30k, and pinyin-to-Chinese conversion. You feed it paired sequences in a simple text format and it trains.

The interesting bit

The author treats this less as a product and more as a living notebook. There’s a quirky layer-by-layer training strategy for deep models—train layer 1, freeze it, add layer 2, repeat—which apparently helps on the pinyin task. The code was recently dragged forward to TensorFlow 2.6.0, and components are deliberately exposed for import into other models.

Key highlights

  • Reaches ~70% validation accuracy on the small Multi30k dataset, matching the reference PyTorch implementation
  • Includes a fast step-by-step decoder with beam search (author notes it “should be modified to be reuseable”)
  • Special learning rate scheduler from the paper is flagged as necessary for deeper stacks
  • transformer.py broken out for drop-in use elsewhere
  • Borrowed preprocessing from the popular PyTorch reference implementation, so the data pipeline is battle-tested

Caveats

  • The beam search and fast decoder are acknowledged as not yet cleanly reusable
  • No pretrained weights or model zoo—you bring your own data and training budget
  • README is sparse on architecture details; you’ll need to read the code or the original paper

Verdict

Grab this if you want to understand transformers by reading Keras code, or need a base to fork for a research prototype. Skip it if you need production-grade tooling, pretrained models, or comprehensive documentation.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.