An AI agent that runs the tools, not just chats
VulnClaw exists so that a single natural-language sentence can trigger the entire reconnaissance-to-report pipeline without manually orchestrating a dozen separate tools.

What it does
VulnClaw is a CLI-driven AI agent that takes a natural-language prompt—like a target URL or a CTF scenario—and autonomously runs a full penetration-testing cycle. It handles reconnaissance, vulnerability scanning, exploitation, and report generation by orchestrating an LLM with a library of 21 built-in skills and 29 codec/crypto utilities. The tool also exposes a TUI workbench and a local Web UI for users who prefer a browser or a guided terminal interface.
The interesting bit
Instead of treating the LLM as a chatbot that merely suggests commands, VulnClaw gives it tool-calling capabilities to actually run scans, execute Python payloads, and generate Markdown reports with runnable PoC scripts. The README is admirably candid that most MCP integrations beyond fetch and memory are still preview or placeholder status, so the “tool chain” is partly aspirational scaffolding.
Key highlights
- Supports 13 LLM providers (OpenAI, DeepSeek, MiniMax, Zhipu, Moonshot, Qwen, and others) through an OpenAI-compatible protocol.
- 21 penetration skills covering core reconnaissance, CTF Web/Crypto/Misc, and OSINT, backed by 180 reference documents.
- Built-in
python_executefor dynamic payload construction, though the README explicitly warns it is a high-risk experimental feature, not a hardened sandbox. - Persistent mode designed for long-running loops (up to 1,000 rounds across 10 cycles) with periodic auto-reporting.
- TUI workbench enforces scope confirmation—allowing or forbidding specific phases like
exploitbefore execution starts.
Caveats
- Most MCP service integrations are currently preview or placeholder; only
fetchandmemoryrun in a stable local mode, so the “MCP toolchain” is thinner than the feature list implies. - The built-in Python execution environment lacks strong isolation and should not be trusted as a security sandbox.
- Knowledge-base retrieval augmentation is described as being gradually integrated, meaning it may not yet be fully wired into the main workflow.
Verdict
Security researchers and CTF players who want to automate repetitive recon-to-report workflows with LLMs will find this a useful experiment. If you need production-grade, fully isolated tool orchestration today, this is still clearly early-stage glue code.