gordicaleksa/pytorch-original-transformer
A PyTorch implementation of the original Transformer architecture from the seminal Vaswani et al. paper, structured as a learning resource.

Velocity · 7d
+0.5
★ / day
Trend
→steady
star history
This repository provides a clean, well-commented PyTorch implementation of the original Transformer model as described in the Attention Is All You Need paper. The code includes educational visualizations in playground.py for concepts like positional encodings and attention mechanisms. It ships with IWSLT pretrained models and is aimed at developers wanting to understand how transformers work under the hood.