← all repositories
vxcontrol/pentagi

When AI red-teams your infrastructure so you don't have to

PentAGI is a self-hosted system that deploys autonomous AI agents to run actual penetration tests inside sandboxed Docker containers.

17.5k stars Go AgentsDomain Apps
pentagi
Velocity · 7d
+34
★ / day
Trend
steady
star history

What it does PentAGI spins up a fleet of specialized AI agents—researchers, developers, executors—that plan and carry out penetration testing tasks against your systems. Everything runs inside isolated Docker containers with 20+ built-in security tools (nmap, Metasploit, sqlmap, etc.). The Go backend coordinates through a React frontend, while PostgreSQL with pgvector and a Neo4j knowledge graph remember what worked across runs.

The interesting bit The project is admirably frank about its limits: it explicitly states it is not a CALDERA-style Breach and Attack Simulation platform with predefined campaigns, and that “BAS-like agent-authored attack scripts” are future work, not reality. That honesty is rarer than you’d think in the AI security tooling space. The architecture diagram, meanwhile, reveals a small observability empire—Grafana, VictoriaMetrics, Jaeger, Loki, Langfuse, ClickHouse—suggesting the authors have operational scars.

Key highlights

  • Supports 10+ LLM providers including local options via Ollama and a documented vLLM + Qwen3.5-27B-FP8 setup for air-gapped deployments
  • Multi-agent delegation with optional execution monitoring and task planning for smaller models
  • Built-in web scraper and integrations with seven search APIs (Tavily, Perplexity, Searxng, etc.)
  • REST and GraphQL APIs with Bearer token authentication for automation
  • Comprehensive Docker Compose deployment with microservices architecture

Caveats

  • JSON flow-report export is not currently documented as a supported output format
  • The “Current Capability Boundaries” section suggests the project is still finding its scope—autonomous pentesting today, BAS perhaps tomorrow
  • 17.4k stars but relatively young; operational maturity in real enterprise environments is unclear from the README alone

Verdict Security teams with existing container infrastructure and tolerance for self-hosted complexity should evaluate this seriously. If you need turnkey adversary emulation or compliance-ready BAS reporting today, look elsewhere—the README makes clear that’s not the current offering.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.