Steer GPT-2 without retraining it
PPLM lets you nudge a frozen language model toward topics or sentiment using tiny plug-in classifiers.

What it does PPLM is a controlled text generation method for GPT-2. You supply a steering objective—either a bag-of-words topic like “military” or a sentiment discriminator—and the model generates text biased toward that attribute while keeping the base language model untouched.
The interesting bit The trick is in the name: “Plug and Play.” The base GPT-2 model stays completely frozen. PPLM instead backpropagates through the model’s hidden states during generation, using a small attribute classifier to nudge activations in the right direction. No fine-tuning, no GPU farm, no waiting.
Key highlights
- Two control modes: bag-of-words (PPLM-BoW) and discriminator-based (PPLM-Discrim for sentiment)
- Hyperparameters are exposed and tunable:
stepsizecontrols intensity,kl_scaleandgm_scaletrade off fluency vs. control - Integrated into Hugging Face Transformers; also runnable via Colab with zero setup
- Paper (ICLR 2020) and blog post included; code supports the original GPT-2 and a separate paper-specific model folder
Caveats
- The models in the root directory differ from the paper’s analysis models; paper baselines need the separate
paper_codefolder and its hyperparameters are “roughly off by a factor of 5” from the main code - Repetitive output is a documented failure mode requiring manual knob-twiddling (
stepsize,kl_scale,grad-length)
Verdict Worth a look if you’re researching controlled generation or need to prototype steering without training budget. Skip if you need a polished, turnkey product—the hyperparameter sensitivity and model mismatch with the paper make this more research artifact than drop-in library.