When AI red-teams your infrastructure so you don't have to
PentAGI is a self-hosted system that deploys autonomous AI agents to run actual penetration tests inside sandboxed Docker containers.

What it does PentAGI spins up a fleet of specialized AI agents—researchers, developers, executors—that plan and carry out penetration testing tasks against your systems. Everything runs inside isolated Docker containers with 20+ built-in security tools (nmap, Metasploit, sqlmap, etc.). The Go backend coordinates through a React frontend, while PostgreSQL with pgvector and a Neo4j knowledge graph remember what worked across runs.
The interesting bit The project is admirably frank about its limits: it explicitly states it is not a CALDERA-style Breach and Attack Simulation platform with predefined campaigns, and that “BAS-like agent-authored attack scripts” are future work, not reality. That honesty is rarer than you’d think in the AI security tooling space. The architecture diagram, meanwhile, reveals a small observability empire—Grafana, VictoriaMetrics, Jaeger, Loki, Langfuse, ClickHouse—suggesting the authors have operational scars.
Key highlights
- Supports 10+ LLM providers including local options via Ollama and a documented vLLM + Qwen3.5-27B-FP8 setup for air-gapped deployments
- Multi-agent delegation with optional execution monitoring and task planning for smaller models
- Built-in web scraper and integrations with seven search APIs (Tavily, Perplexity, Searxng, etc.)
- REST and GraphQL APIs with Bearer token authentication for automation
- Comprehensive Docker Compose deployment with microservices architecture
Caveats
- JSON flow-report export is not currently documented as a supported output format
- The “Current Capability Boundaries” section suggests the project is still finding its scope—autonomous pentesting today, BAS perhaps tomorrow
- 17.4k stars but relatively young; operational maturity in real enterprise environments is unclear from the README alone
Verdict Security teams with existing container infrastructure and tolerance for self-hosted complexity should evaluate this seriously. If you need turnkey adversary emulation or compliance-ready BAS reporting today, look elsewhere—the README makes clear that’s not the current offering.