heretic

Q: google/gemma-4-12B-it와 Heretic을 사용할 때 KL 발산이 NaN이 되는 이유는 무엇인가요?

문제는 모델의 출력 로짓을 잘못 처리하여 발생합니다. `google/gemma-4-12B-it`가 예상된 `Gemma4ForConditionalGeneration` 대신 `Gemma4UnifiedForConditionalGeneration`을 사용하기 때문입니다(transformers v5.10.1 기준). 이로 인해 유효하지 않은 확률 분포와 NaN KL 발산이 발생합니다. 수정 사항은 PR #350에서 제공되며, 이는 KL 발산 계산에 원시 생성 로짓을 사용하도록 전환합니다. 이 패치가 포함된 최신 버전으로 Heretic을 업데이트하거나, PR #350의 변경 사항을 수동으로 적용하십시오.

CLI 도구

p-e-w/heretic

자동으로 언어 모델에서 검열(안전 정렬)을 제거하는 도구.

저장소 방문 홈페이지

개요

Heretic은 값비싼 후훈련 없이 Transformer 언어 모델의 안전 정렬을 제거합니다. 고급 소거 방향과 TPE 기반 최적화 프로그램을 결합하여 높은 지능을 유지하는 검열되지 않은 모델을 완전 자동으로 생성합니다.

README 미리보기

\n\n# Heretic: Fully automatic censorship removal for language models[](https://discord.gg/gdXc48gSyT) [](https://huggingface.co/heretic-org) [](https://codeberg.org/p-e-w/heretic)\n\n[](https://trendshift.io/repositories/20538)\n\nHeretic is a tool that removes censorship (aka "safety alignment") from\ntransformer-based language models without expensive post-training.\nIt combines an advanced implementation of directional ablation, also known\nas "abliteration" ([Arditi et al. 2024](https://arxiv.org/abs/2406.11717),\nLai 2025 ([1](https://huggingface.co/blog/grimjim/projected-abliteration),\n[2](https://huggingface.co/blog/grimjim/norm-preserving-biprojected-abliteration))),\nwith a TPE-based parameter optimizer powered by [Optuna](https://optuna.org/).\n\nThis approach enables Heretic to work **completely automatically.** Heretic\nfinds high-quality abliteration parameters by co-minimizing the number of\nrefusals and the KL divergence from the original model. This results in a\ndecensored model that retains as much of the original model's intelligence\nas possible. Using Heretic does not require an understanding of transformer\ninternals. In fact, anyone who knows how to run a command-line program\ncan use Heretic to decensor language models.\n\nHeretic supports most dense models, including many multimodal models,\nseveral different MoE architectures, and even some hybrid models like Qwen3.5.\nPure state-space models and certain other research architectures are not yet\nsupported out of the box.\n\n\n\n&nbsp;\n\nRunning unsupervised with the default configuration, Heretic can produce\ndecensored models that rival the quality of abliterations created manually\nby human experts:\n\n| Model | Refusals for "harmful" prompts | KL divergence from original model for "harmless" prompts |\n| :--- | ---: | ---: |\n| [google/gemma-3-12b-it](https://huggingface.co/google/gemma-3-12b-it) (original) | 97/100 | 0 *(by definition)* |\n| [mlabonne/gemma-3-12b-it-abliterated-v2](h

FAQ (2)

문제 해결

google/gemma-4-12B-it와 Heretic을 사용할 때 KL 발산이 NaN이 되는 이유는 무엇인가요?

문제는 모델의 출력 로짓을 잘못 처리하여 발생합니다. google/gemma-4-12B-it가 예상된 Gemma4ForConditionalGeneration 대신 Gemma4UnifiedForConditionalGeneration을 사용하기 때문입니다(transformers v5.10.1 기준). 이로 인해 유효하지 않은 확률 분포와 NaN KL 발산이 발생합니다. 수정 사항은 PR #350에서 제공되며, 이는 KL 발산 계산에 원시 생성 로짓을 사용하도록 전환합니다. 이 패치가 포함된 최신 버전으로 Heretic을 업데이트하거나, PR #350의 변경 사항을 수동으로 적용하십시오.

원본 Issue #346

문제 해결

왜 Heretic이 Apple Silicon MPS에서 UnboundLocalError: cannot access local variable 'analyzer' 오류로 충돌하나요?

이것은 Heretic v1.2.0에서 알려진 회귀 문제입니다(이슈 #239). #301에서 수정되었습니다. 최신 master 브랜치로 업데이트하세요: pip install git+https://github.com/p-e-w/heretic.git. 해당 수정 사항은 다음 PyPI 릴리스에 포함될 예정입니다.

원본 Issue #299

heretic

개요

README 미리보기

FAQ (2)

同类型项目

hermes-agent

firecrawl

go

markitdown