← all repositories

meridianlabs-ai/inspect_petri

An alignment auditing agent that automates testing of language models for potential alignment issues, reward hacking, and concerning behaviors.

1.2k stars Python AgentsLLMOps · Eval
inspect_petri
Velocity · 7d
+4.2
★ / day
Trend
steady
star history

Inspect Petri is an auditing agent designed to test alignment hypotheses in language models through automated, end-to-end scenarios. It generates realistic audit scenarios from seed instructions, orchestrates multi-turn interactions between an auditor model and a target model, and simulates tools and rollbacks to probe model behaviors. Transcripts are then scored by a judge model using a consistent rubric, enabling researchers to systematically detect alignment failures and reward hacking in AI systems.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.