← all repositories
ucinlp/autoprompt

Teaching BERT to fish: gradient-based prompt hacking

AutoPrompt finds the magic words that make masked language models do NLP tasks without fine-tuning.

639 stars Python LLMOps · EvalLanguage Models
autoprompt
Velocity · 7d
+0.3
★ / day
Trend
steady
star history

What it does AutoPrompt automatically constructs discrete prompts for masked language models (BERT, RoBERTa) using gradient-guided search. You provide a template with placeholder trigger tokens, and it searches for the actual token sequences that make the model perform sentiment analysis, natural language inference, fact retrieval, or relation extraction—no task-specific fine-tuning required.

The interesting bit The prompts are discrete actual vocabulary tokens, not soft embeddings you can’t read. The search is clever: it uses gradients to identify promising candidate tokens, then evaluates them directly. The label tokens themselves are part of the optimization—so “positive” sentiment might map to learned tokens like “Ġmarvelous” or “Ġvisionary” rather than human-chosen words.

Key highlights

  • Works across four task types: sentiment (SST-2), NLI (SICK-E), fact retrieval (T-REx/LAMA), and relation extraction
  • Trigger tokens are shared across all prompts for a task; only the input content changes
  • Supports both BERT and RoBERTa with appropriate special token handling
  • Includes a separate label search mode to discover which vocabulary tokens best represent each class
  • Evaluation pipeline requires a separate LAMA fork with its own conda environment

Caveats

  • Setup is involved: separate conda env for LAMA evaluation, manual data downloads from Google Drive, spacy model installation
  • The README’s command examples use hardcoded label maps with seemingly arbitrary tokens (“ĠTaiwan” for entailment, “ĠOnly” for contradiction) that appear to be artifacts of the search process rather than human-curated choices
  • Relation extraction and fact retrieval require careful dataset filtering and configuration; some relations are excluded due to baseline incompatibility

Verdict Worth a look if you’re researching prompt engineering or probing what MLMs know without fine-tuning. Skip if you need something production-ready; this is research code with the attendant setup friction.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.