microsoftarchive/promptbench
A unified Python library for evaluating and understanding the robustness of large language models against adversarial prompts.

Velocity · 7d
+2.6
★ / day
Trend
→steady
star history
PromptBench is a benchmarking framework developed by Microsoft for evaluating LLM performance and robustness. It provides standardized evaluation pipelines, benchmark datasets, and adversarial attack testing for prompts. The library supports multiple LLMs and includes tools for measuring model sensitivity to prompt variations.