← all repositories

wuyoscar/Internal-Safety-Collapse

Benchmark framework for evaluating how frontier LLMs and AI agents can be turned into sensitive data generators through internal safety collapse attacks.

Internal-Safety-Collapse
Velocity · 7d
+7.9
★ / day
Trend
steady
star history

ISC-Bench is a safety evaluation framework designed to assess vulnerabilities in large language models and AI agents. It focuses on a novel attack vector called Internal Safety Collapse, where models can be manipulated to generate sensitive data without external indicators. The repository provides benchmark datasets, evaluation protocols, and red-teaming methodologies for assessing LLM safety under various attack scenarios.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.