FunASR
Frameworkmodelscope/FunASR
Industrial-grade speech recognition toolkit with 170x realtime speed.
Overview
FunASR is an end-to-end speech recognition toolkit supporting over 50 languages, speaker diarization, emotion detection, streaming, and an OpenAI-compatible API. It achieves 170x realtime speed and offers pre-trained models like SenseVoice and Paraformer.
README Preview
([简体中文](./README_zh.md)|English|[日本語](./README_ja.md)|[한국어](./README_ko.md))\n\n\n\n\n\n\n Industrial speech recognition. 170x faster than Whisper. 50+ languages.\n Speaker diarization · Emotion detection · Streaming · One API call\n\n\n\n \n \n \n \n\n\n\n\n\n\n\n Quick Start · Colab · Benchmark · Model selection · Migration guide · Use cases · Deployment matrix · Models · Agent Integration · Docs · Contribute\n\n\n---\n\n## Quick Start\n\n[](https://colab.research.google.com/github/modelscope/FunASR/blob/main/examples/colab/funasr_quickstart.ipynb)\n\nNo local setup? Open the [Colab quickstart](./examples/colab/) to transcribe a public sample or upload your own audio in a browser.\n\n```bash\npip install torch torchaudio\npip install funasr\n```\n\n```python\nfrom funasr import AutoModel\n\nmodel = AutoModel(model="iic/SenseVoiceSmall", vad_model="fsmn-vad", spk_model="cam++", device="cuda")\nresult = model.generate(input="meeting.wav")\n```\n\n**Output** — structured text with speaker labels, timestamps, and punctuation:\n```\n[00:00.4 → 00:03.8] Speaker 0: Let's discuss the Q3 plan.\n[00:04.2 → 00:07.1] Speaker 1: Sounds good. I have three points.\n[00:07.5 → 00:12.3] Speaker 0: Go ahead. We have 30 minutes.\n```\n\nThat's it. **One model, one call** — VAD segmentation, speech recognition, punctuation, speaker diarization all happen automatically.\n\n### LLM-powered ASR: Fun-ASR-Nano\n\nFor highest accuracy across 31 languages (including Chinese dialects), use [Fun-ASR-Nano](https://github.com/FunAudioLLM/Fun-ASR) — an LLM-based ASR combining SenseVoice encoder with Qwen3-0.6B decoder:\n\n```python\nfrom funasr import AutoModel\n\nmodel = AutoModel(model="FunAudioLLM/Fun-ASR-Nano-2512", vad_model="fsmn-vad", device="cuda")\nresult = model.generate(input="meeting.wav")\n```\n\nWith vLLM acceleration (16x faster, batch processing):\n\n```python\nfrom funasr.auto.auto_model_vllm import AutoModelVLLM\n\nmodel = AutoModelVLLM(model="FunAudioLLM/Fun-ASR-Na