← all repositories
uber-research/PPLM

Steer GPT-2 without retraining it

PPLM lets you nudge a frozen language model toward topics or sentiment using tiny plug-in classifiers.

1.2k stars Python Language Models
PPLM
Velocity · 7d
+0.5
★ / day
Trend
steady
star history

What it does PPLM is a controlled text generation method for GPT-2. You supply a steering objective—either a bag-of-words topic like “military” or a sentiment discriminator—and the model generates text biased toward that attribute while keeping the base language model untouched.

The interesting bit The trick is in the name: “Plug and Play.” The base GPT-2 model stays completely frozen. PPLM instead backpropagates through the model’s hidden states during generation, using a small attribute classifier to nudge activations in the right direction. No fine-tuning, no GPU farm, no waiting.

Key highlights

  • Two control modes: bag-of-words (PPLM-BoW) and discriminator-based (PPLM-Discrim for sentiment)
  • Hyperparameters are exposed and tunable: stepsize controls intensity, kl_scale and gm_scale trade off fluency vs. control
  • Integrated into Hugging Face Transformers; also runnable via Colab with zero setup
  • Paper (ICLR 2020) and blog post included; code supports the original GPT-2 and a separate paper-specific model folder

Caveats

  • The models in the root directory differ from the paper’s analysis models; paper baselines need the separate paper_code folder and its hyperparameters are “roughly off by a factor of 5” from the main code
  • Repetitive output is a documented failure mode requiring manual knob-twiddling (stepsize, kl_scale, grad-length)

Verdict Worth a look if you’re researching controlled generation or need to prototype steering without training budget. Skip if you need a polished, turnkey product—the hyperparameter sensitivity and model mismatch with the paper make this more research artifact than drop-in library.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.