wuyoscar/Internal-Safety-Collapse
Benchmark framework for evaluating how frontier LLMs and AI agents can be turned into sensitive data generators through internal safety collapse attacks.

ISC-Bench is a safety evaluation framework designed to assess vulnerabilities in large language models and AI agents. It focuses on a novel attack vector called Internal Safety Collapse, where models can be manipulated to generate sensitive data without external indicators. The repository provides benchmark datasets, evaluation protocols, and red-teaming methodologies for assessing LLM safety under various attack scenarios.