← all repositories
ipa-lab/hackingBuddyGPT

When LLMs get root: a pen-testing framework that runs on curiosity

A Python framework that lets security researchers spin up autonomous LLM hacking agents in ~50 lines of code.

1.1k stars Python AgentsOther AI
hackingBuddyGPT
Velocity · 7d
+1.1
★ / day
Trend
steady
star history

What it does

HackingBuddyGPT is a Python framework for building LLM-driven penetration-testing agents. It handles the plumbing—SSH or local shell connections, LLM API wrangling, logging to SQLite, round limits—so researchers can focus on writing attack logic. The flagship use-case tasks an LLM with escalating from low-privilege user to root on a Linux system, running commands autonomously until it succeeds or hits a timeout.

The interesting bit

The “50 lines of code” pitch is the hook, but the real value is the benchmark infrastructure. The team maintains reusable Linux privilege-escalation benchmarks and publishes open-access papers comparing LLM performance, turning what could be a toy into a reproducible research platform. It also won a spot in GitHub Accelerator 2024.

Key highlights

  • Minimal agent skeleton is genuinely short; the README shows a working Linux priv-esc agent in a single Python class
  • Supports both remote SSH targets and local shell execution (with appropriate warnings about running untrusted LLM-generated commands on your own machine)
  • Includes extended variants with RAG and chain-of-thought for more sophisticated experiments
  • Web pentest and web API testing agents exist but are marked pre-alpha/WIP
  • Active academic backing: two published papers, conference presentations at ESEC/FSE and ESSAI

Caveats

  • Web and web-api use-cases are in “heavy development and pre-alpha stage” per the README
  • The framework executes live commands on real systems; the authors explicitly warn about data loss and system modification risks

Verdict

Worth a look for security researchers or red-teamers experimenting with LLM autonomy, especially if you need reproducible benchmarks. Skip it if you want polished, production-ready web testing tools—those aren’t here yet.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.