Is jailbreak_llms open source?

Yes — verazuo/jailbreak_llms is open source, released under the MIT license.

What language is jailbreak_llms written in?

verazuo/jailbreak_llms is primarily written in Jupyter Notebook.

How popular is jailbreak_llms?

verazuo/jailbreak_llms has 3.7k stars on GitHub.

Where can I find jailbreak_llms?

verazuo/jailbreak_llms is on GitHub at https://github.com/verazuo/jailbreak_llms.

← all repositories

verazuo/jailbreak_llms

Scraping a year of real-world LLM jailbreaks from Reddit and Discord

A research dataset and framework that turns 15,140 real user prompts—1,405 of them actual jailbreaks—into a measurable benchmark for LLM safety.

★3.7k stars Jupyter Notebook Data Tooling LLMOps · Eval

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does This repository hosts the dataset and code for an ACM CCS 2024 paper measuring how people actually try to bypass LLM safeguards in the wild. It collects 15,140 real prompts from Reddit, Discord, prompt-sharing websites, and open datasets between December 2022 and December 2023, manually identifying 1,405 of them as jailbreak attempts. The authors also provide a benchmark of 390 forbidden questions across 13 scenarios drawn from OpenAI’s usage policy to test whether those prompts actually succeed.

The interesting bit Rather than inventing synthetic attacks, the team built a framework called JailbreakHub to catalog real community tactics—like roleplay and “Do Anything Now” prompts—as they evolved in public forums. It is essentially an epidemiological study for prompt engineering: tracking how adversarial ideas spread through r/ChatGPTJailbreak and Discord servers instead of a sterile lab.

Key highlights

15,140 prompts sourced from Reddit, Discord, FlowGPT, AIPRM, and existing datasets, spanning a full year.
1,405 labeled jailbreak prompts, which the authors call the largest in-the-wild collection of its kind.
A 390-question evaluation set targeting 13 forbidden categories from OpenAI’s policy.
Available as Hugging Face datasets with CSV originals; MIT licensed.
Findings were responsibly disclosed to LLM vendors.

Caveats

The dataset contains genuinely harmful language by design; the authors flag reader discretion.
The included code is minimal—an evaluator script for ChatGLM and a single visualization notebook—so expect to bring your own analysis pipeline.
If you plan to train on the data, the authors recommend deduplicating prompts first.

Verdict Security researchers and red-teamers building LLM guardrails should dig in; if you are looking for a polished attack framework or a clean, harmless dataset, this is not your repo.

Frequently asked

What is verazuo/jailbreak_llms?: A research dataset and framework that turns 15,140 real user prompts—1,405 of them actual jailbreaks—into a measurable benchmark for LLM safety.
Is jailbreak_llms open source?: Yes — verazuo/jailbreak_llms is open source, released under the MIT license.
What language is jailbreak_llms written in?: verazuo/jailbreak_llms is primarily written in Jupyter Notebook.
How popular is jailbreak_llms?: verazuo/jailbreak_llms has 3.7k stars on GitHub.
Where can I find jailbreak_llms?: verazuo/jailbreak_llms is on GitHub at https://github.com/verazuo/jailbreak_llms.