FunASR
SHA-256Industrial-grade speech recognition toolkit achieving 340x realtime speed, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API.
Ultra-fast speech recognition toolkit, 26x faster than Whisper, with built-in speaker diarization and emotion detection.
Core Features
- Record speed: up to 340x realtime on GPU (26x faster than Whisper) with Fun-ASR-Nano + vLLM
- 50+ languages supported: flagship Nano model covers 31 languages; SenseVoice supports zh/en/ja/ko/yue
- Built-in speaker diarization: no extra integration needed – one call returns speaker labels and timestamps
- Emotion detection: SenseVoice recognizes emotional tone (happiness, sadness, etc.) alongside transcription
- Streaming support: Paraformer enables WebSocket real-time recognition for live meetings, broadcasts, etc.
What It Can't Do
- •The flagship Fun-ASR-Nano requires a GPU (NVIDIA) for full speed; on CPU use SenseVoiceSmall. Install PyTorch (GPU or CPU version) first before funasr. When combining multiple models (VAD+ASR+speaker), monitor VRAM usage; refer to model_selection.md. This is a Python library, not a standalone desktop app – basic Python skills needed.
Use Cases
- Automated meeting minutes: multi-speaker transcription with emotion tags and timestamps
- Smart customer service and voice assistants: integrate via OpenAI-compatible API for low-latency streaming responses
Detailed Introduction
FunASR is a fundamental end-to-end speech recognition toolkit designed for production use. It achieves up to 340x realtime performance (26x faster than Whisper), supports 50+ languages, and offers integrated speaker diarization, emotion detection, and streaming capabilities. Unlike standalone ASR models like Whisper, FunASR is a full toolkit that lets you mix and match models (e.g., SenseVoice for CPU-friendly recognition, Paraformer for low-latency streaming) all with a single Python API. It is MIT-licensed, completely self-hostable, and provides an OpenAI-compatible API server for easy integration with AI agents and external applications. From batch transcription to real-time streaming, FunASR delivers enterprise-grade accuracy at zero cloud cost.
Tags
Getting Started
Download installer
Click the button above to download the installer for your system
Install the software
Open the downloaded dmg file, then drag the app to Applications
Ensure Python 3.8+ and PyTorch are installed (follow official PyTorch guide for your platform)
Run `pip install funasr` to install the library
Use the Python code example in README: load a model with AutoModel, call generate() on your audio file
- Ensure Python 3.8+ and PyTorch are installed (follow official PyTorch guide for your platform)
- Run `pip install funasr` to install the library
- Use the Python code example in README: load a model with AutoModel, call generate() on your audio file
SHA-256 checksum verified
Checksum extracted from GitHub official Release page
SHA256 Checksum
fbc633301cc9deec54e28a4adf88ac04ab9f9a89fe82ec84cf4df90644ed5321This checksum is extracted from the GitHub Release page. Verify file integrity after download.
All SHA-256 checksums on this platform are extracted from the project's official GitHub Release page, without any modification. You can independently verify them on the GitHub Releases page.
Open Source Transparency
View GitHub SourceUninstall Info
Run `pip uninstall funasr` to remove the library. If you want a complete cleanup, also uninstall PyTorch and torchaudio manually.
No Extra Dependencies
Ready to use after download. No additional runtime required.
Similar Projects
ollama
Ollama lets you download, run, and manage large language models locally. One command, multiple platforms, endless possibilities.
llama.cpp
High-performance LLM inference engine in C/C++ with minimal dependencies, supporting quantized models (1.5–8 bit) and diverse hardware (Apple Silicon, CUDA, Vulkan, etc.).
opencv
OpenCV is an open-source computer vision and machine learning library with over 2500 optimized algorithms for real-time image and video analysis.