← all repositories
yaserkl/RLSeq2Seq

When seq2seq models need a nudge from game theory

A 2018 TensorFlow toolkit that bolts reinforcement learning onto encoder-decoder models to fix exposure bias and metric mismatch in summarization.

767 stars Python Language ModelsML Frameworks
RLSeq2Seq
Velocity · 7d
+0.3
★ / day
Trend
steady
star history

What it does RLSeq2Seq is a research codebase that trains abstractive text summarizers by combining standard sequence-to-sequence models with reinforcement learning tricks. It implements scheduled sampling variants, policy-gradient with self-critic training, and actor-critic methods using DDQN and dueling networks, all targeted at the CNN/Daily Mail and Newsroom datasets.

The interesting bit The project treats text generation as a decision-making problem rather than pure supervised learning. It bundles multiple RL papers into one training framework—so you can swap between Bengio’s scheduled sampling, Ranzato’s end-to-end backprop, and actor-critic architectures without rewriting the model from scratch.

Key highlights

  • Supports TensorFlow 1.10.1 (yes, the TF 1.x era)
  • Implements three major RL families: scheduled sampling, policy-gradient with self-critic, and actor-critic via DDQN/dueling networks
  • Ships with preprocessing scripts that the authors claim boost ROUGE scores on CNN/Daily Mail and Newsroom
  • Includes pointer-generator coverage and intra-decoder attention mechanisms
  • Directly tied to arXiv:1805.09461 with a citation request baked into the README

Caveats

  • Explicitly noted as “no longer actively maintained”
  • Requires Python 2.7, CUDA 9, and cuDNN 7.1—a stack that is now archaeological
  • README is thorough on paper references but sparse on architectural details or current benchmark standings

Verdict Worth a look if you’re reproducing 2018 summarization papers or studying how RL was grafted onto seq2seq before transformers took over. Skip it if you need production code or modern PyTorch/TF 2.x support.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.