← all repositories
facebookresearch/UnsupervisedMT

Facebook's research code for translating without parallel text

The original EMNLP 2018 implementation that learns to translate using only monolingual data, back-translation, and shared parameters.

1.5k stars Python Language ModelsML Frameworks
UnsupervisedMT
Velocity · 7d
+0.5
★ / day
Trend
steady
star history

What it does

This repo implements two approaches to unsupervised machine translation: a neural model (NMT) and a phrase-based statistical model (PBSMT). Both learn to translate between languages without any paired sentences — just raw monolingual text. The NMT version supports seq2seq, biLSTM+attention, and Transformer architectures, with extensive parameter sharing across languages.

The interesting bit

The training recipe is delightfully indirect: start with cross-lingual word embeddings, train denoising auto-encoders to reconstruct corrupted sentences in each language, then bootstrap actual translation through back-translation loops (English→French→English, and vice versa). The system generates back-parallel data on the fly using 30 CPU threads synced to the GPU model every 1000 steps — a neat little distributed dance.

Key highlights

  • Three NMT architectures with arbitrary parameter sharing across encoders, decoders, and embeddings
  • On-the-fly back-translation generation with configurable CPU worker processes
  • Dynamic loss scheduling: auto-encoder loss fades from 1 to 0 over 300k steps, leaving pure back-translation
  • PBSMT pipeline with unsupervised phrase-table generation and automated Moses training
  • Helper scripts for English-French and English-German that download, tokenize, BPE-encode, and binarize data
  • Achieves >23 BLEU on newstest2014 en-fr after ~25 epochs (one day on a V100)

Caveats

  • The authors themselves recommend using their later XLM repo instead for NMT — “better model and more efficient implementation”
  • PyTorch 0.5-era code; expect some archaeology needed to run on modern versions
  • PBSMT requires compiling or downloading Moses, which the README notes “is not always straightforward”
  • Several features (adversarial training, arbitrary multilingual training, LM pretraining) are implemented but “left for future work” — i.e., not used in the paper

Verdict

Worth studying if you’re researching low-resource MT or the evolution of unsupervised translation methods. Skip it for production use — the authors already pointed you to XLM. The PBSMT path is a curiosity for historical comparison with modern neural approaches.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.