← all repositories
Unclecheng-li/VulnClaw

An AI agent that runs the tools, not just chats

VulnClaw exists so that a single natural-language sentence can trigger the entire reconnaissance-to-report pipeline without manually orchestrating a dozen separate tools.

VulnClaw
Collecting fresh signals — velocity needs a few days of history.
collecting data…
star history

What it does

VulnClaw is a CLI-driven AI agent that takes a natural-language prompt—like a target URL or a CTF scenario—and autonomously runs a full penetration-testing cycle. It handles reconnaissance, vulnerability scanning, exploitation, and report generation by orchestrating an LLM with a library of 21 built-in skills and 29 codec/crypto utilities. The tool also exposes a TUI workbench and a local Web UI for users who prefer a browser or a guided terminal interface.

The interesting bit

Instead of treating the LLM as a chatbot that merely suggests commands, VulnClaw gives it tool-calling capabilities to actually run scans, execute Python payloads, and generate Markdown reports with runnable PoC scripts. The README is admirably candid that most MCP integrations beyond fetch and memory are still preview or placeholder status, so the “tool chain” is partly aspirational scaffolding.

Key highlights

  • Supports 13 LLM providers (OpenAI, DeepSeek, MiniMax, Zhipu, Moonshot, Qwen, and others) through an OpenAI-compatible protocol.
  • 21 penetration skills covering core reconnaissance, CTF Web/Crypto/Misc, and OSINT, backed by 180 reference documents.
  • Built-in python_execute for dynamic payload construction, though the README explicitly warns it is a high-risk experimental feature, not a hardened sandbox.
  • Persistent mode designed for long-running loops (up to 1,000 rounds across 10 cycles) with periodic auto-reporting.
  • TUI workbench enforces scope confirmation—allowing or forbidding specific phases like exploit before execution starts.

Caveats

  • Most MCP service integrations are currently preview or placeholder; only fetch and memory run in a stable local mode, so the “MCP toolchain” is thinner than the feature list implies.
  • The built-in Python execution environment lacks strong isolation and should not be trusted as a security sandbox.
  • Knowledge-base retrieval augmentation is described as being gradually integrated, meaning it may not yet be fully wired into the main workflow.

Verdict

Security researchers and CTF players who want to automate repetitive recon-to-report workflows with LLMs will find this a useful experiment. If you need production-grade, fully isolated tool orchestration today, this is still clearly early-stage glue code.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.