OpenSource-Hub

heretic

CLI 도구

p-e-w/heretic

언어 모델에서 자동으로 검사 (안전 조정) 도구를 제거합니다.

개요

Heretic은 값비싼 훈련이 필요하지 않아도 Transformer 언어 모델의 안전 조정을 제거할 수 있습니다. TPE 기반 최적화기와 고급 융합 방향 구현과 결합하여 높은 지능을 유지하는 검열되지 않은 모델을 완전히 자동으로 생성합니다.

README 미리보기

\n\n# Heretic: Fully automatic censorship removal for language models[](https://discord.gg/gdXc48gSyT) [](https://huggingface.co/heretic-org) [](https://codeberg.org/p-e-w/heretic)\n\n[](https://trendshift.io/repositories/20538)\n\nHeretic is a tool that removes censorship (aka "safety alignment") from\ntransformer-based language models without expensive post-training.\nIt combines an advanced implementation of directional ablation, also known\nas "abliteration" ([Arditi et al. 2024](https://arxiv.org/abs/2406.11717),\nLai 2025 ([1](https://huggingface.co/blog/grimjim/projected-abliteration),\n[2](https://huggingface.co/blog/grimjim/norm-preserving-biprojected-abliteration))),\nwith a TPE-based parameter optimizer powered by [Optuna](https://optuna.org/).\n\nThis approach enables Heretic to work **completely automatically.** Heretic\nfinds high-quality abliteration parameters by co-minimizing the number of\nrefusals and the KL divergence from the original model. This results in a\ndecensored model that retains as much of the original model's intelligence\nas possible. Using Heretic does not require an understanding of transformer\ninternals. In fact, anyone who knows how to run a command-line program\ncan use Heretic to decensor language models.\n\nHeretic supports most dense models, including many multimodal models,\nseveral different MoE architectures, and even some hybrid models like Qwen3.5.\nPure state-space models and certain other research architectures are not yet\nsupported out of the box.\n\n\n\n \n\nRunning unsupervised with the default configuration, Heretic can produce\ndecensored models that rival the quality of abliterations created manually\nby human experts:\n\n| Model | Refusals for "harmful" prompts | KL divergence from original model for "harmless" prompts |\n| :--- | ---: | ---: |\n| [google/gemma-3-12b-it](https://huggingface.co/google/gemma-3-12b-it) (original) | 97/100 | 0 *(by definition)* |\n| [mlabonne/gemma-3-12b-it-abliterated-v2](h