The seq2seq model that taught a generation, now in maintenance mode
A 2016-era Torch implementation of attentional encoder-decoder LSTMs that became the foundation for OpenNMT.

What it does
This is a Lua/Torch implementation of the now-standard sequence-to-sequence architecture: LSTM encoder, LSTM decoder, optional attention over the source sequence. It handles machine translation, summarization, and similar sequence transduction tasks. A Python preprocessing script turns parallel text into HDF5 shards; Lua scripts handle training and beam-search decoding.
The interesting bit
The README itself announces the project’s obsolescence: OpenNMT is the “fully supported feature-complete rewrite.” What’s notable is how much research this codebase absorbed before being superseded. SYSTRAN contributed production-grade additions—character-level CNN inputs with highway networks, model pruning, knowledge distillation, guided alignment, multi-attention, residual connections, and linguistic feature support—making it a surprisingly complete snapshot of mid-2010s neural MT engineering.
Key highlights
- Implements Luong et al.’s global-general attention with optional input feeding
- Character-level encoding via Kim et al.’s CharCNN + highway network architecture
- Bidirectional LSTM encoder with shared embeddings between forward/backward passes
- Extensive hyperparameter surface: curriculum learning, layer-specific learning rates, pretrained embedding loading, gradient clipping, dropout between vertical LSTM stacks
- Beam search decoding with dictionary-aware prediction
Caveats
- Built for Torch, not PyTorch; the
thcommand and Lua dependency stack (cutorch, cunn, cudnn, luautf8) are archaeological artifacts at this point - README explicitly states new features and optimizations move to OpenNMT
- Several advanced features (residual connections, multi-attention) are noted as not improving translation performance in the authors’ experiments
Verdict
Worth studying if you’re writing a history-of-NMT thesis or need to reproduce a 2015–2016 paper exactly. Everyone else should use OpenNMT or modern frameworks. The code is honest about its limitations, which is refreshing.