Is UnsupervisedMT open source?

Yes — facebookresearch/UnsupervisedMT is an open-source project tracked on heatdrop.

What language is UnsupervisedMT written in?

facebookresearch/UnsupervisedMT is primarily written in Python.

How popular is UnsupervisedMT?

facebookresearch/UnsupervisedMT has 1.5k stars on GitHub.

Where can I find UnsupervisedMT?

facebookresearch/UnsupervisedMT is on GitHub at https://github.com/facebookresearch/UnsupervisedMT.

← all repositories

facebookresearch/UnsupervisedMT

Facebook's research code for translating without parallel text

The original EMNLP 2018 implementation that learns to translate using only monolingual data, back-translation, and shared parameters.

★1.5k stars Python Language Models ML Frameworks

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

This repo implements two approaches to unsupervised machine translation: a neural model (NMT) and a phrase-based statistical model (PBSMT). Both learn to translate between languages without any paired sentences — just raw monolingual text. The NMT version supports seq2seq, biLSTM+attention, and Transformer architectures, with extensive parameter sharing across languages.

The interesting bit

The training recipe is delightfully indirect: start with cross-lingual word embeddings, train denoising auto-encoders to reconstruct corrupted sentences in each language, then bootstrap actual translation through back-translation loops (English→French→English, and vice versa). The system generates back-parallel data on the fly using 30 CPU threads synced to the GPU model every 1000 steps — a neat little distributed dance.

Key highlights

Three NMT architectures with arbitrary parameter sharing across encoders, decoders, and embeddings
On-the-fly back-translation generation with configurable CPU worker processes
Dynamic loss scheduling: auto-encoder loss fades from 1 to 0 over 300k steps, leaving pure back-translation
PBSMT pipeline with unsupervised phrase-table generation and automated Moses training
Helper scripts for English-French and English-German that download, tokenize, BPE-encode, and binarize data
Achieves >23 BLEU on newstest2014 en-fr after ~25 epochs (one day on a V100)

Caveats

The authors themselves recommend using their later XLM repo instead for NMT — “better model and more efficient implementation”
PyTorch 0.5-era code; expect some archaeology needed to run on modern versions
PBSMT requires compiling or downloading Moses, which the README notes “is not always straightforward”
Several features (adversarial training, arbitrary multilingual training, LM pretraining) are implemented but “left for future work” — i.e., not used in the paper

Verdict

Worth studying if you’re researching low-resource MT or the evolution of unsupervised translation methods. Skip it for production use — the authors already pointed you to XLM. The PBSMT path is a curiosity for historical comparison with modern neural approaches.

Frequently asked

What is facebookresearch/UnsupervisedMT?: The original EMNLP 2018 implementation that learns to translate using only monolingual data, back-translation, and shared parameters.
Is UnsupervisedMT open source?: Yes — facebookresearch/UnsupervisedMT is an open-source project tracked on heatdrop.
What language is UnsupervisedMT written in?: facebookresearch/UnsupervisedMT is primarily written in Python.
How popular is UnsupervisedMT?: facebookresearch/UnsupervisedMT has 1.5k stars on GitHub.
Where can I find UnsupervisedMT?: facebookresearch/UnsupervisedMT is on GitHub at https://github.com/facebookresearch/UnsupervisedMT.