Is heretic open source?

Yes — p-e-w/heretic is open source, released under the AGPL-3.0 license.

What language is heretic written in?

p-e-w/heretic is primarily written in Python.

How popular is heretic?

p-e-w/heretic has 26.6k stars on GitHub and is currently cooling off.

Where can I find heretic?

p-e-w/heretic is on GitHub at https://github.com/p-e-w/heretic.

← all repositories

p-e-w/heretic

Uncensoring LLMs without dumbing them down

It automates the removal of transformer safety alignment so you don't have to hand-tune abliteration parameters or pay for expensive post-training.

★26.6k stars Python Language Models

View on GitHub ↗ Homepage ↗

Velocity · 7d

+43

★ / day

Trend

↘cooling

star history

What it does

Heretic is a command-line tool that strips “safety alignment” — read: censorship — from transformer-based language models. It implements directional ablation (abliteration) and uses an Optuna-powered TPE optimizer to automatically hunt for parameters that reduce refusals on “harmful” prompts while keeping the model’s behavior on “harmless” prompts close to the original via KL divergence. The result is a decensored model that supposedly retains more of the base model’s capabilities than manually abliterated alternatives. It supports dense architectures, many multimodal and MoE models, and some hybrids like Qwen3.5.

The interesting bit

The clever part isn’t just the ablation; it’s the co-minimization strategy. Heretic treats refusal rate and KL divergence as a joint optimization target, letting the algorithm trade off between compliance and model fidelity without human babysitting. It also auto-benchmarks your hardware to pick a batch size, which is the kind of boring convenience that actually matters when you’re processing multi-billion-parameter models on a single GPU.

Key highlights

Claims lower KL divergence than manual abliterations on benchmarks like Gemma-3-12b-it (0.16 vs 0.45 and 1.04) while matching refusal suppression (3/100).
Supports quantization via bnb_4bit to squeeze large models into limited VRAM.
The community has published well over 3,000 decensored models built with the tool.
Includes optional research features for interpretability: PaCMAP residual projections and geometric analysis tables.
Runs without requiring knowledge of transformer internals; the README claims anyone who can run a CLI program can use it.

Caveats

Pure state-space models and certain research architectures are not supported out of the box.
Benchmark numbers shown in the README are explicitly noted as potentially platform- and hardware-dependent.
The optional residual-vector plotting relies on CPU-bound PaCMAP projections that can take more than an hour for larger models.

Verdict

Local LLM tinkerers and researchers who need uncensored models for red-teaming or domain-specific applications should look here — especially if they’ve found manual abliteration too blunt. If you’re perfectly happy with cloud APIs and their safety filters, or you work exclusively with unsupported architectures, this is just another niche tool.

Frequently asked

What is p-e-w/heretic?: It automates the removal of transformer safety alignment so you don't have to hand-tune abliteration parameters or pay for expensive post-training.
Is heretic open source?: Yes — p-e-w/heretic is open source, released under the AGPL-3.0 license.
What language is heretic written in?: p-e-w/heretic is primarily written in Python.
How popular is heretic?: p-e-w/heretic has 26.6k stars on GitHub and is currently cooling off.
Where can I find heretic?: p-e-w/heretic is on GitHub at https://github.com/p-e-w/heretic.