← all repositories

agencyenterprise/PromptInject

Research framework that evaluates LLM robustness to adversarial prompt injection attacks like goal hijacking and prompt leaking.

496 stars Python LLMOps · EvalLanguage Models
PromptInject
Velocity · 7d
+0.4
★ / day
Trend
steady
star history

PromptInject is a modular prompt assembly framework designed to systematically evaluate vulnerabilities in large language models under adversarial conditions. It provides quantitative metrics for measuring how easily LLMs like GPT-3 can be misaligned through handcrafted inputs targeting goal hijacking and prompt leaking attacks. The framework was developed for AI safety research and received a Best Paper Award at the NeurIPS 2022 ML Safety Workshop.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.