Question 1

How to fix 'Dimension out of range' error when running VoxCPM on CPU with PyTorch 2.11?

Accepted Answer

This is a known bug in PyTorch 2.11.0+ that causes scaled_dot_product_attention to fail with 'Dimension out of range (expected to be in range of [-1, 0], but got -2)' on CPU. Workaround: downgrade PyTorch to a version below 2.11, such as 2.5.1. For CPU-only, install torch==2.5.1 via pip (e.g., pip install torch==2.5.1). For GPU (CUDA 12.1), use torch==2.5.1+cu121. See PyTorch issue #163597 for details.

Question 2

Why does VoxCPM2 crash with CUDA errors (e.g., 'Offset increment outside graph capture') when using multiple subprocess workers on the same GPU?

Accepted Answer

This is a known instability caused by torch.compile's CUDA graph optimization when multiple processes share a GPU memory pool. The recommended workaround is to use a single-process serving architecture such as nano-vllm-voxcpm (https://github.com/a710128/nanovllm-voxcpm) or vllm-omni (https://github.com/OpenBMB/VoxCPM#-production-serving-vllm-omni), which avoids multi-process CUDA graph conflicts. A production-ready FastAPI wrapper for nano-vllm-voxcpm is available at https://github.com/uttera/uttera-tts-vllm.

Question 3

Why does audio quality progressively degrade when using LoRA fine-tuning with nano-vllm on Blackwell (RTX 5090) GPUs?

Accepted Answer

This is a known issue caused by CUDA graph memory pool conflicts with LoRA and an object leak in nano-vllm's scheduler on Blackwell (sm_120) architecture. The only effective workaround is to periodically restart the inference process every 2–3 hours, which resets leaked objects and defragments GPU memory. Track issue #326 and nano-vllm-voxcpm #61 for permanent fixes.

Question 4

Why does voxcpm2 voice cloning produce distorted, demon-like output with incorrect audio duration?

Accepted Answer

This is a known instability in voxcpm2 and voxcpm1.5. As a temporary workaround, switch to voxcpm0.5b, which works correctly with the same inputs. No permanent fix is available yet; monitor the GitHub issue for updates.

Question 5

How to fix 'triton is not installed' warning when using torch.compile?

Accepted Answer

Install triton version matching your PyTorch. For torch==2.5.1, use triton==3.1.0 (Linux with NVIDIA GPU). Check hardware supports triton (compute capability 7.0+). Windows support is limited; ignore warning if functionality unaffected. To fix: pip install triton==3.1.0. If you installed wrong version (e.g., 2.1.0 caused errors), uninstall it: pip uninstall triton, then install correct one.

VoxCPM

Overview

README Preview

FAQ (5)

同类型项目

puppeteer

PaddleOCR

crawl4ai

supervision