Is Safety-Prompts open source?

Yes — thu-coai/Safety-Prompts is open source, released under the Apache-2.0 license.

How popular is Safety-Prompts?

thu-coai/Safety-Prompts has 1.2k stars on GitHub.

Where can I find Safety-Prompts?

thu-coai/Safety-Prompts is on GitHub at https://github.com/thu-coai/Safety-Prompts.

← all repositories

thu-coai/Safety-Prompts

A 100k-prompt stress test for Chinese LLM safety

A dataset of 100,000 Chinese safety prompts and ChatGPT refusals for benchmarking and fine-tuning LLM guardrails.

★1.2k stars LLMOps · Eval Data Tooling Language Models

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does The repository contains roughly 100,000 Chinese prompts spanning seven typical safety scenarios—insults, discrimination, illegal activities, physical and mental harm, privacy violations, and ethical traps—plus six instruction-attack categories such as goal hijacking and prompt leaking. Each prompt is paired with a ChatGPT refusal or safe response, making it a ready-made corpus for red-teaming or supervised fine-tuning. The authors also link to companion leaderboards and newer evaluation tools like SafetyBench if you prefer multiple-choice testing.

The interesting bit Most safety datasets skew heavily English; this one targets Chinese linguistic and cultural contexts, from face-saving insults to role-play jailbreaks. It doubles as both a benchmark and a training corpus, which is rarer than it should be.

Key highlights

70k prompts across typical safety scenarios (10k each for insults, bias, crime, physical harm, mental health, privacy, and ethics).
30k instruction-attack prompts covering tactics like goal hijacking, prompt leaking, unsafe instruction topics, and reverse exposure.
Every entry includes the adversarial prompt and a ChatGPT-generated safe or refusal response.
The authors explicitly recommend the dataset for training and fine-tuning, directing evaluators to their newer SafetyBench platform instead.
Data is available as JSON files in the repo and via HuggingFace Datasets.

Caveats

The README is clear that this collection is better suited for training than for turnkey benchmarking; evaluation-minded users are nudged toward SafetyBench.
All safe responses are generated by ChatGPT, so they reflect OpenAI’s guardrail style rather than an official Chinese regulatory standard.

Verdict Worth a look if you’re building or fine-tuning Chinese-language LLMs and need a broad safety net. Skip it if you’re only looking for an English-centric red-teaming toolkit or a plug-and-play evaluation harness.

Frequently asked

What is thu-coai/Safety-Prompts?: A dataset of 100,000 Chinese safety prompts and ChatGPT refusals for benchmarking and fine-tuning LLM guardrails.
Is Safety-Prompts open source?: Yes — thu-coai/Safety-Prompts is open source, released under the Apache-2.0 license.
How popular is Safety-Prompts?: thu-coai/Safety-Prompts has 1.2k stars on GitHub.
Where can I find Safety-Prompts?: thu-coai/Safety-Prompts is on GitHub at https://github.com/thu-coai/Safety-Prompts.