Is attention-is-all-you-need-pytorch open source?

Yes — jadore801120/attention-is-all-you-need-pytorch is open source, released under the MIT license.

What language is attention-is-all-you-need-pytorch written in?

jadore801120/attention-is-all-you-need-pytorch is primarily written in Python.

How popular is attention-is-all-you-need-pytorch?

jadore801120/attention-is-all-you-need-pytorch has 9.8k stars on GitHub.

Where can I find attention-is-all-you-need-pytorch?

jadore801120/attention-is-all-you-need-pytorch is on GitHub at https://github.com/jadore801120/attention-is-all-you-need-pytorch.

← all repositories

jadore801120/attention-is-all-you-need-pytorch

A PyTorch port of the original 2017 Transformer, still under construction

A PyTorch recreation of the original self-attention Transformer for machine translation experiments.

★9.8k stars Python Language Models ML Frameworks

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does This repository implements the Transformer model from the 2017 paper “Attention Is All You Need” in PyTorch. It focuses on sequence-to-sequence translation—specifically the WMT'16 German-to-English multimodal task—replacing convolution and recurrence with self-attention. The code supports training a model from scratch and running inference with a trained checkpoint.

The interesting bit The README explicitly frames this as a PyTorch alternative to the official TensorFlow implementation in tensor2tensor. Rather than adding new tricks, it aims to replicate the paper’s architecture and training recipe, including label smoothing and shared embedding weights.

Key highlights

Implements the original multi-head self-attention Transformer architecture.
Supports end-to-end training and translation on the WMT'16 de-en dataset via torchtext and spacy.
Includes label smoothing, learning-rate warmup, and weight sharing between target embeddings and the pre-softmax linear layer.
Explicitly marked as a work in progress; training curves are provided, but test evaluation is listed as “coming soon.”
Project structure and preprocessing scripts are heavily borrowed from OpenNMT-py.

Caveats

Incomplete features. Evaluation on generated text, attention-weight plotting, and BPE decoding are all listed as unfinished or untested.
No published test scores. The performance section shows training loss curves but states testing is “coming soon.”
BPE pipeline is experimental. The README warns that BPE-related parts are “not yet fully tested” and require manually switching the main function call.

Verdict Use this if you want a readable, focused implementation of the original Transformer in PyTorch. Skip it if you need a maintained, feature-complete framework—this repo is explicitly unfinished and borrows much of its plumbing from OpenNMT-py.

Frequently asked

What is jadore801120/attention-is-all-you-need-pytorch?: A PyTorch recreation of the original self-attention Transformer for machine translation experiments.
Is attention-is-all-you-need-pytorch open source?: Yes — jadore801120/attention-is-all-you-need-pytorch is open source, released under the MIT license.
What language is attention-is-all-you-need-pytorch written in?: jadore801120/attention-is-all-you-need-pytorch is primarily written in Python.
How popular is attention-is-all-you-need-pytorch?: jadore801120/attention-is-all-you-need-pytorch has 9.8k stars on GitHub.
Where can I find attention-is-all-you-need-pytorch?: jadore801120/attention-is-all-you-need-pytorch is on GitHub at https://github.com/jadore801120/attention-is-all-you-need-pytorch.