← all repositories

microsoftarchive/promptbench

A unified Python library for evaluating and understanding the robustness of large language models against adversarial prompts.

2.8k stars Python LLMOps · Eval
promptbench
Velocity · 7d
+2.6
★ / day
Trend
steady
star history

PromptBench is a benchmarking framework developed by Microsoft for evaluating LLM performance and robustness. It provides standardized evaluation pipelines, benchmark datasets, and adversarial attack testing for prompts. The library supports multiple LLMs and includes tools for measuring model sensitivity to prompt variations.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.