meridianlabs-ai/inspect_petri
An alignment auditing agent that automates testing of language models for potential alignment issues, reward hacking, and concerning behaviors.

Inspect Petri is an auditing agent designed to test alignment hypotheses in language models through automated, end-to-end scenarios. It generates realistic audit scenarios from seed instructions, orchestrates multi-turn interactions between an auditor model and a target model, and simulates tools and rollbacks to probe model behaviors. Transcripts are then scored by a judge model using a consistent rubric, enabling researchers to systematically detect alignment failures and reward hacking in AI systems.