Un-safetuning LLMs with a single CLI command
Heretic automatically strips safety alignment from transformer models without retraining, using optimization to find the least destructive way to make them stop refusing.

What it does
Heretic is a command-line tool that removes “safety alignment” — the trained-in refusal behavior — from language models. You run heretic <model-name> and it outputs a decensored version, no manual tuning required. It works by finding directions in the model’s internal representations that correspond to refusal, then ablating them while minimizing how much the rest of the model’s behavior shifts.
The interesting bit
The clever part isn’t the abliteration technique itself (that’s established research); it’s the automation. Heretic uses Optuna’s TPE optimizer to search for abliteration parameters that simultaneously minimize refusals and KL divergence from the original model. This co-optimization is what lets it run unsupervised and still beat hand-tuned abliterations on metrics like the Gemma-3-12B benchmark table shows: same 3/100 refusal rate as manual versions, but KL divergence of 0.16 versus 0.45 or 1.04.
Key highlights
- Supports dense transformers, multimodal models, MoE architectures, and hybrids like Qwen3.5
- ~20-30 minutes to decensor a 4B model on an RTX 3090; auto-detects optimal batch size
- Optional bitsandbytes 4-bit quantization for VRAM-constrained runs
- Built-in evaluation mode to reproduce benchmark numbers against original models
- Research extras include PaCMAP residual visualization and geometric analysis tables
- Community has published well over 3000 Heretic-derived models on Hugging Face
Caveats
- Pure state-space models (Mamba, etc.) and some research architectures aren’t supported yet
- PaCMAP plotting is CPU-bound and can take an hour+ for larger models
- PyTorch 2.2 minimum, but some newer model formats need 2.6+ features
Verdict
Worth a look if you’re running local LLMs and tired of models refusing benign requests, or if you’re doing mechanistic interpretability research and want automated residual analysis. Skip it if you’re satisfied with cloud APIs or your use case doesn’t hit alignment boundaries.