heretic

CLI Tool

p-e-w/heretic

Fully automatic censorship removal for language models via abliteration.

Overview

Heretic removes safety alignment from transformer language models without expensive post-training. It combines advanced directional ablation (abliteration) with a TPE-based optimizer for fully automatic operation, producing uncensored models that retain high intelligence.

README Preview

\n\n# Heretic: Fully automatic censorship removal for language models[](https://discord.gg/gdXc48gSyT) [](https://huggingface.co/heretic-org) [](https://codeberg.org/p-e-w/heretic)\n\n[](https://trendshift.io/repositories/20538)\n\nHeretic is a tool that removes censorship (aka "safety alignment") from\ntransformer-based language models without expensive post-training.\nIt combines an advanced implementation of directional ablation, also known\nas "abliteration" ([Arditi et al. 2024](https://arxiv.org/abs/2406.11717),\nLai 2025 ([1](https://huggingface.co/blog/grimjim/projected-abliteration),\n[2](https://huggingface.co/blog/grimjim/norm-preserving-biprojected-abliteration))),\nwith a TPE-based parameter optimizer powered by [Optuna](https://optuna.org/).\n\nThis approach enables Heretic to work **completely automatically.** Heretic\nfinds high-quality abliteration parameters by co-minimizing the number of\nrefusals and the KL divergence from the original model. This results in a\ndecensored model that retains as much of the original model's intelligence\nas possible. Using Heretic does not require an understanding of transformer\ninternals. In fact, anyone who knows how to run a command-line program\ncan use Heretic to decensor language models.\n\nHeretic supports most dense models, including many multimodal models,\nseveral different MoE architectures, and even some hybrid models like Qwen3.5.\nPure state-space models and certain other research architectures are not yet\nsupported out of the box.\n\n\n\n&nbsp;\n\nRunning unsupervised with the default configuration, Heretic can produce\ndecensored models that rival the quality of abliterations created manually\nby human experts:\n\n| Model | Refusals for "harmful" prompts | KL divergence from original model for "harmless" prompts |\n| :--- | ---: | ---: |\n| [google/gemma-3-12b-it](https://huggingface.co/google/gemma-3-12b-it) (original) | 97/100 | 0 *(by definition)* |\n| [mlabonne/gemma-3-12b-it-abliterated-v2](h

FAQ (2)

Troubleshooting

Why does KL divergence become NaN when using google/gemma-4-12B-it with Heretic?

The issue is caused by incorrect handling of the model's output logits because google/gemma-4-12B-it uses Gemma4UnifiedForConditionalGeneration (as of transformers v5.10.1) instead of the expected Gemma4ForConditionalGeneration. This leads to invalid probability distributions and NaN KL divergence. The fix is available in PR #350, which switches to raw generation logits for KL divergence computation. Update Heretic to the latest version that includes this patch, or manually apply the changes from PR #350.

GitHub Issue #346

Troubleshooting

Why does Heretic crash with UnboundLocalError: cannot access local variable 'analyzer' on Apple Silicon MPS?

This is a known regression in Heretic v1.2.0 (issue #239). It was fixed in #301. Update to the latest master branch: pip install git+https://github.com/p-e-w/heretic.git. The fix will be included in the next PyPI release.

GitHub Issue #299

heretic

Overview

README Preview

FAQ (2)

同类型项目

hermes-agent

firecrawl

go

markitdown