AI pentesters that actually prove their findings
Autonomous security agents that run dynamic exploits and validate vulnerabilities with real proof-of-concepts, not static-scan guesswork.

What it does Strix deploys teams of AI agents to hack your applications like actual pentesters would. They run code dynamically, manipulate HTTP traffic, automate browsers, and validate every finding with a working proof-of-concept. The CLI targets local code, GitHub repos, or live URLs, and outputs actionable reports. A GitHub Actions workflow can block vulnerable PRs before merge.
The interesting bit The “Graph of Agents” architecture distributes specialized agents across different attack surfaces and lets them share discoveries mid-operation. It’s not just an LLM with a CVE database—it’s a sandboxed runtime where agents write and execute Python exploits, run terminal commands, and coordinate reconnaissance in parallel.
Key highlights
- Full toolkit per agent: HTTP proxy, browser automation, interactive shell, Python runtime
- Multi-target scans: local directories, GitHub repos, deployed apps, or combinations
- Headless mode (
-n) with non-zero exit codes for CI/CD gating - Auto-scopes to PR diffs in quick mode; requires
fetch-depth: 0or explicit--diff-base - Supports major LLM providers (OpenAI, Anthropic, Google, local via Ollama/LMStudio)
- Enterprise tier offers VPC/self-hosted deployment and BYOK model support
Caveats
- Requires Docker running and a paid LLM API key; no fully offline mode mentioned
- First run pulls a sandbox Docker image—plan for that latency in CI
- Recommended models (GPT-5.4, Claude Sonnet 4.6, Gemini 3 Pro) are all bleeding-edge preview/beta releases; stability and availability unclear
Verdict Security teams tired of static analysis false positives and manual pentest backlogs should evaluate this. Developers with simple apps or tight API budgets may find it overkill; the LLM costs could sting at scale.