← all repositories
harvardnlp/seq2seq-attn

The seq2seq model that taught a generation, now in maintenance mode

A 2016-era Torch implementation of attentional encoder-decoder LSTMs that became the foundation for OpenNMT.

1.3k stars Lua Language Models
seq2seq-attn
Velocity · 7d
+0.3
★ / day
Trend
steady
star history

What it does

This is a Lua/Torch implementation of the now-standard sequence-to-sequence architecture: LSTM encoder, LSTM decoder, optional attention over the source sequence. It handles machine translation, summarization, and similar sequence transduction tasks. A Python preprocessing script turns parallel text into HDF5 shards; Lua scripts handle training and beam-search decoding.

The interesting bit

The README itself announces the project’s obsolescence: OpenNMT is the “fully supported feature-complete rewrite.” What’s notable is how much research this codebase absorbed before being superseded. SYSTRAN contributed production-grade additions—character-level CNN inputs with highway networks, model pruning, knowledge distillation, guided alignment, multi-attention, residual connections, and linguistic feature support—making it a surprisingly complete snapshot of mid-2010s neural MT engineering.

Key highlights

  • Implements Luong et al.’s global-general attention with optional input feeding
  • Character-level encoding via Kim et al.’s CharCNN + highway network architecture
  • Bidirectional LSTM encoder with shared embeddings between forward/backward passes
  • Extensive hyperparameter surface: curriculum learning, layer-specific learning rates, pretrained embedding loading, gradient clipping, dropout between vertical LSTM stacks
  • Beam search decoding with dictionary-aware prediction

Caveats

  • Built for Torch, not PyTorch; the th command and Lua dependency stack (cutorch, cunn, cudnn, luautf8) are archaeological artifacts at this point
  • README explicitly states new features and optimizations move to OpenNMT
  • Several advanced features (residual connections, multi-attention) are noted as not improving translation performance in the authors’ experiments

Verdict

Worth studying if you’re writing a history-of-NMT thesis or need to reproduce a 2015–2016 paper exactly. Everyone else should use OpenNMT or modern frameworks. The code is honest about its limitations, which is refreshing.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.