OpenSource-Hub
F

FunASR

SHA-256
18.8k stars·AI Productivity·SHA-256 checksum verified

Industrial-grade speech recognition toolkit achieving 340x realtime speed, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API.

Smart Download

Download Download Version

vruntime-llamacpp-v0.1.4 · 7.2 MB

Ultra-fast speech recognition toolkit, 26x faster than Whisper, with built-in speaker diarization and emotion detection.

Core Features

  • Record speed: up to 340x realtime on GPU (26x faster than Whisper) with Fun-ASR-Nano + vLLM
  • 50+ languages supported: flagship Nano model covers 31 languages; SenseVoice supports zh/en/ja/ko/yue
  • Built-in speaker diarization: no extra integration needed – one call returns speaker labels and timestamps
  • Emotion detection: SenseVoice recognizes emotional tone (happiness, sadness, etc.) alongside transcription
  • Streaming support: Paraformer enables WebSocket real-time recognition for live meetings, broadcasts, etc.

What It Can't Do

  • The flagship Fun-ASR-Nano requires a GPU (NVIDIA) for full speed; on CPU use SenseVoiceSmall. Install PyTorch (GPU or CPU version) first before funasr. When combining multiple models (VAD+ASR+speaker), monitor VRAM usage; refer to model_selection.md. This is a Python library, not a standalone desktop app – basic Python skills needed.

Use Cases

  • Automated meeting minutes: multi-speaker transcription with emotion tags and timestamps
  • Smart customer service and voice assistants: integrate via OpenAI-compatible API for low-latency streaming responses

Detailed Introduction

FunASR is a fundamental end-to-end speech recognition toolkit designed for production use. It achieves up to 340x realtime performance (26x faster than Whisper), supports 50+ languages, and offers integrated speaker diarization, emotion detection, and streaming capabilities. Unlike standalone ASR models like Whisper, FunASR is a full toolkit that lets you mix and match models (e.g., SenseVoice for CPU-friendly recognition, Paraformer for low-latency streaming) all with a single Python API. It is MIT-licensed, completely self-hostable, and provides an OpenAI-compatible API server for easy integration with AI agents and external applications. From batch transcription to real-time streaming, FunASR delivers enterprise-grade accuracy at zero cloud cost.

Tags

语音识别ASR深度学习多语言实时说话人分离情感检测开源工具Python

Getting Started

1

Download installer

Click the button above to download the installer for your system

2

Install the software

Open the downloaded dmg file, then drag the app to Applications

3

Ensure Python 3.8+ and PyTorch are installed (follow official PyTorch guide for your platform)

4

Run `pip install funasr` to install the library

5

Use the Python code example in README: load a model with AutoModel, call generate() on your audio file

Install Guide
  1. Ensure Python 3.8+ and PyTorch are installed (follow official PyTorch guide for your platform)
  2. Run `pip install funasr` to install the library
  3. Use the Python code example in README: load a model with AutoModel, call generate() on your audio file
File Integrity

SHA-256 checksum verified

Checksum extracted from GitHub official Release page

SHA256 Checksum

fbc633301cc9deec54e28a4adf88ac04ab9f9a89fe82ec84cf4df90644ed5321

This checksum is extracted from the GitHub Release page. Verify file integrity after download.

All SHA-256 checksums on this platform are extracted from the project's official GitHub Release page, without any modification. You can independently verify them on the GitHub Releases page.

Open Source Transparency

View GitHub Source
Environment Guide

Uninstall Info

Run `pip uninstall funasr` to remove the library. If you want a complete cleanup, also uninstall PyTorch and torchaudio manually.

No Extra Dependencies

Ready to use after download. No additional runtime required.

Project Info
LicenseMIT
Last Updated2026-07-03T17:32:14Z
GitHub RepositoryOfficial Website

Similar Projects