Facebook's research code for translating without parallel text
The original EMNLP 2018 implementation that learns to translate using only monolingual data, back-translation, and shared parameters.

What it does
This repo implements two approaches to unsupervised machine translation: a neural model (NMT) and a phrase-based statistical model (PBSMT). Both learn to translate between languages without any paired sentences — just raw monolingual text. The NMT version supports seq2seq, biLSTM+attention, and Transformer architectures, with extensive parameter sharing across languages.
The interesting bit
The training recipe is delightfully indirect: start with cross-lingual word embeddings, train denoising auto-encoders to reconstruct corrupted sentences in each language, then bootstrap actual translation through back-translation loops (English→French→English, and vice versa). The system generates back-parallel data on the fly using 30 CPU threads synced to the GPU model every 1000 steps — a neat little distributed dance.
Key highlights
- Three NMT architectures with arbitrary parameter sharing across encoders, decoders, and embeddings
- On-the-fly back-translation generation with configurable CPU worker processes
- Dynamic loss scheduling: auto-encoder loss fades from 1 to 0 over 300k steps, leaving pure back-translation
- PBSMT pipeline with unsupervised phrase-table generation and automated Moses training
- Helper scripts for English-French and English-German that download, tokenize, BPE-encode, and binarize data
- Achieves >23 BLEU on newstest2014 en-fr after ~25 epochs (one day on a V100)
Caveats
- The authors themselves recommend using their later XLM repo instead for NMT — “better model and more efficient implementation”
- PyTorch 0.5-era code; expect some archaeology needed to run on modern versions
- PBSMT requires compiling or downloading Moses, which the README notes “is not always straightforward”
- Several features (adversarial training, arbitrary multilingual training, LM pretraining) are implemented but “left for future work” — i.e., not used in the paper
Verdict
Worth studying if you’re researching low-resource MT or the evolution of unsupervised translation methods. Skip it for production use — the authors already pointed you to XLM. The PBSMT path is a curiosity for historical comparison with modern neural approaches.