OpenSource-Hub

heretic

CLI ツール

p-e-w/heretic

言語モデルからセンス(セキュリティ調節)を自動的に削除するツール。

概要

Heretic では、トランスフォーマー言語モデルのセキュリティアライアンスを高額なトレーニングなしに削除できます。 TPE ベースの最適化と高度な溶解方向の実装を組み合わせて、高度なインテリジェンスを維持する未検閲モデルを完全に自動的に生成します。

README プレビュー

\n\n# Heretic: Fully automatic censorship removal for language models[](https://discord.gg/gdXc48gSyT) [](https://huggingface.co/heretic-org) [](https://codeberg.org/p-e-w/heretic)\n\n[](https://trendshift.io/repositories/20538)\n\nHeretic is a tool that removes censorship (aka "safety alignment") from\ntransformer-based language models without expensive post-training.\nIt combines an advanced implementation of directional ablation, also known\nas "abliteration" ([Arditi et al. 2024](https://arxiv.org/abs/2406.11717),\nLai 2025 ([1](https://huggingface.co/blog/grimjim/projected-abliteration),\n[2](https://huggingface.co/blog/grimjim/norm-preserving-biprojected-abliteration))),\nwith a TPE-based parameter optimizer powered by [Optuna](https://optuna.org/).\n\nThis approach enables Heretic to work **completely automatically.** Heretic\nfinds high-quality abliteration parameters by co-minimizing the number of\nrefusals and the KL divergence from the original model. This results in a\ndecensored model that retains as much of the original model's intelligence\nas possible. Using Heretic does not require an understanding of transformer\ninternals. In fact, anyone who knows how to run a command-line program\ncan use Heretic to decensor language models.\n\nHeretic supports most dense models, including many multimodal models,\nseveral different MoE architectures, and even some hybrid models like Qwen3.5.\nPure state-space models and certain other research architectures are not yet\nsupported out of the box.\n\n\n\n \n\nRunning unsupervised with the default configuration, Heretic can produce\ndecensored models that rival the quality of abliterations created manually\nby human experts:\n\n| Model | Refusals for "harmful" prompts | KL divergence from original model for "harmless" prompts |\n| :--- | ---: | ---: |\n| [google/gemma-3-12b-it](https://huggingface.co/google/gemma-3-12b-it) (original) | 97/100 | 0 *(by definition)* |\n| [mlabonne/gemma-3-12b-it-abliterated-v2](h