FunASR

SHA-256

18.8k stars·AI Productivity·SHA-256 checksum verified

Industrial-grade speech recognition toolkit achieving 340x realtime speed, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API.

Smart Download

Download Download Version

vruntime-llamacpp-v0.1.4 · 7.2 MB

Ultra-fast speech recognition toolkit, 26x faster than Whisper, with built-in speaker diarization and emotion detection.

Core Features

Record speed: up to 340x realtime on GPU (26x faster than Whisper) with Fun-ASR-Nano + vLLM
50+ languages supported: flagship Nano model covers 31 languages; SenseVoice supports zh/en/ja/ko/yue
Built-in speaker diarization: no extra integration needed – one call returns speaker labels and timestamps
Emotion detection: SenseVoice recognizes emotional tone (happiness, sadness, etc.) alongside transcription
Streaming support: Paraformer enables WebSocket real-time recognition for live meetings, broadcasts, etc.

What It Can't Do

•The flagship Fun-ASR-Nano requires a GPU (NVIDIA) for full speed; on CPU use SenseVoiceSmall. Install PyTorch (GPU or CPU version) first before funasr. When combining multiple models (VAD+ASR+speaker), monitor VRAM usage; refer to model_selection.md. This is a Python library, not a standalone desktop app – basic Python skills needed.

Use Cases

Automated meeting minutes: multi-speaker transcription with emotion tags and timestamps
Smart customer service and voice assistants: integrate via OpenAI-compatible API for low-latency streaming responses

Detailed Introduction

FunASR is a fundamental end-to-end speech recognition toolkit designed for production use. It achieves up to 340x realtime performance (26x faster than Whisper), supports 50+ languages, and offers integrated speaker diarization, emotion detection, and streaming capabilities. Unlike standalone ASR models like Whisper, FunASR is a full toolkit that lets you mix and match models (e.g., SenseVoice for CPU-friendly recognition, Paraformer for low-latency streaming) all with a single Python API. It is MIT-licensed, completely self-hostable, and provides an OpenAI-compatible API server for easy integration with AI agents and external applications. From batch transcription to real-time streaming, FunASR delivers enterprise-grade accuracy at zero cloud cost.

Getting Started

Download installer

Click the button above to download the installer for your system

macOS· 7.2 MB Windows· 7.8 MB

Install the software

Open the downloaded dmg file, then drag the app to Applications

Ensure Python 3.8+ and PyTorch are installed (follow official PyTorch guide for your platform)

Run `pip install funasr` to install the library

Use the Python code example in README: load a model with AutoModel, call generate() on your audio file

Install Guide

Ensure Python 3.8+ and PyTorch are installed (follow official PyTorch guide for your platform)
Run `pip install funasr` to install the library
Use the Python code example in README: load a model with AutoModel, call generate() on your audio file

File Integrity

SHA-256 checksum verified

Checksum extracted from GitHub official Release page

SHA256 Checksum

fbc633301cc9deec54e28a4adf88ac04ab9f9a89fe82ec84cf4df90644ed5321

This checksum is extracted from the GitHub Release page. Verify file integrity after download.

All SHA-256 checksums on this platform are extracted from the project's official GitHub Release page, without any modification. You can independently verify them on the GitHub Releases page.

Open Source Transparency

View GitHub Source

Environment Guide

Uninstall Info

Run `pip uninstall funasr` to remove the library. If you want a complete cleanup, also uninstall PyTorch and torchaudio manually.

No Extra Dependencies

Ready to use after download. No additional runtime required.

Project Info

LicenseMIT

Last Updated2026-07-03T17:32:14Z

GitHub Repository Official Website

Similar Projects

ollama

Ollama lets you download, run, and manage large language models locally. One command, multiple platforms, endless possibilities.

llama.cpp

High-performance LLM inference engine in C/C++ with minimal dependencies, supporting quantized models (1.5–8 bit) and diverse hardware (Apple Silicon, CUDA, Vulkan, etc.).

opencv

OpenCV is an open-source computer vision and machine learning library with over 2500 optimized algorithms for real-time image and video analysis.