项目简介
Heretic 无需昂贵后训练即可移除 Transformer 语言模型的安全对齐。结合先进的消融方向实现与基于 TPE 的优化器,全自动生成保留高智能的未审查模型。
README 预览
\n\n# Heretic: Fully automatic censorship removal for language models[](https://discord.gg/gdXc48gSyT) [](https://huggingface.co/heretic-org) [](https://codeberg.org/p-e-w/heretic)\n\n[](https://trendshift.io/repositories/20538)\n\nHeretic is a tool that removes censorship (aka "safety alignment") from\ntransformer-based language models without expensive post-training.\nIt combines an advanced implementation of directional ablation, also known\nas "abliteration" ([Arditi et al. 2024](https://arxiv.org/abs/2406.11717),\nLai 2025 ([1](https://huggingface.co/blog/grimjim/projected-abliteration),\n[2](https://huggingface.co/blog/grimjim/norm-preserving-biprojected-abliteration))),\nwith a TPE-based parameter optimizer powered by [Optuna](https://optuna.org/).\n\nThis approach enables Heretic to work **completely automatically.** Heretic\nfinds high-quality abliteration parameters by co-minimizing the number of\nrefusals and the KL divergence from the original model. This results in a\ndecensored model that retains as much of the original model's intelligence\nas possible. Using Heretic does not require an understanding of transformer\ninternals. In fact, anyone who knows how to run a command-line program\ncan use Heretic to decensor language models.\n\nHeretic supports most dense models, including many multimodal models,\nseveral different MoE architectures, and even some hybrid models like Qwen3.5.\nPure state-space models and certain other research architectures are not yet\nsupported out of the box.\n\n\n\n \n\nRunning unsupervised with the default configuration, Heretic can produce\ndecensored models that rival the quality of abliterations created manually\nby human experts:\n\n| Model | Refusals for "harmful" prompts | KL divergence from original model for "harmless" prompts |\n| :--- | ---: | ---: |\n| [google/gemma-3-12b-it](https://huggingface.co/google/gemma-3-12b-it) (original) | 97/100 | 0 *(by definition)* |\n| [mlabonne/gemma-3-12b-it-abliterated-v2](h