← all repositories

jadore801120/attention-is-all-you-need-pytorch

A PyTorch implementation of the Transformer sequence-to-sequence model for machine translation.

9.7k stars Python Language ModelsML Frameworks
attention-is-all-you-need-pytorch
Velocity · 7d
+3.0
★ / day
Trend
steady
star history

Implements the Transformer architecture from the seminal “Attention is All You Need” paper using PyTorch. Provides training and translation scripts for sequence-to-sequence tasks like WMT translation datasets, using self-attention mechanisms instead of RNNs or convolutions. The model supports shared embeddings, label smoothing, and beam search decoding.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.