← all repositories
thunlp/OpenAttack

A Swiss Army knife for breaking NLP models

OpenAttack wraps 15+ textual adversarial attack methods into a tidy Python toolkit with Hugging Face integration.

776 stars Python LLMOps · EvalML Frameworks
OpenAttack
Velocity · 7d
+0.3
★ / day
Trend
steady
star history

What it does

OpenAttack automates the full pipeline of textual adversarial attacks: preprocessing, victim model access, adversarial example generation, and evaluation. It bundles 15 attack models covering sentence-, word-, and character-level perturbations, plus gradient-, score-, decision-based, and blind attack strategies. You can attack built-in BERT/RoBERTa models, plug in your own classifier, or bring a custom dataset via Hugging Face’s datasets library.

The interesting bit

The toolkit treats attacks as composable operations rather than one-off scripts. You subclass oa.Classifier to wrap any model, swap attack algorithms like PWWS or Genetic, and parallelize across workers with a single num_workers argument. The README even walks through adversarial training—using generated examples to retrain a more robust model, which is less common in attack-focused tools.

Key highlights

  • 15 built-in attack models spanning all major textual perturbation levels and victim access types
  • Native multiprocessing support via num_workers parameter
  • English and Chinese support with an extensible design for more languages
  • Full Hugging Face Transformers and Datasets integration
  • Custom attack model construction from reusable components (token shufflers, etc.)

Caveats

  • The README is truncated mid-sentence during the attack models list; exact coverage beyond the 15 named is unclear
  • Chinese support exists but the example is referenced, not shown inline
  • No explicit performance benchmarks or attack success rates are provided in the visible documentation

Verdict

Worth a look if you’re doing NLP robustness research, red-teaming language models, or need systematic baselines for a paper. Skip it if you just need a single attack method—installing a full toolkit for one algorithm is overkill.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.