jadore801120/attention-is-all-you-need-pytorch
A PyTorch implementation of the Transformer sequence-to-sequence model for machine translation.

Velocity · 7d
+3.0
★ / day
Trend
→steady
star history
Implements the Transformer architecture from the seminal “Attention is All You Need” paper using PyTorch. Provides training and translation scripts for sequence-to-sequence tasks like WMT translation datasets, using self-attention mechanisms instead of RNNs or convolutions. The model supports shared embeddings, label smoothing, and beam search decoding.