Is FunASR open source?

Yes — modelscope/FunASR is open source, released under the MIT license.

What language is FunASR written in?

modelscope/FunASR is primarily written in Python.

How popular is FunASR?

modelscope/FunASR has 19.4k stars on GitHub and is currently cooling off.

Where can I find FunASR?

modelscope/FunASR is on GitHub at https://github.com/modelscope/FunASR.

← all repositories

modelscope/FunASR

Whisper at 170× speed, with speaker labels and feelings

A Chinese speech toolkit that bundles ASR, diarization, emotion detection, and streaming into one MIT-licensed package.

★19.4k stars Python Image · Video · Audio Inference · Serving Agents

View on GitHub ↗ Homepage ↗

Velocity · 7d

+21

★ / day

Trend

↘cooling

star history

What it does FunASR is a speech recognition toolkit that transcribes audio, tags speakers, detects emotions (happy/sad/angry), and inserts punctuation—all from a single Python call. It runs fully offline, streams over WebSocket, and exposes an OpenAI-compatible API server. The project is developed by Alibaba’s ModelScope team and targets production use, particularly for Chinese and multilingual scenarios.

The interesting bit The speed claims are aggressive: SenseVoice-Small hits 170× realtime on GPU and 17× on CPU, which the benchmark table says is 13× faster than Whisper-large-v3. The newer Fun-ASR-Nano trades raw speed for accuracy by bolting a Qwen3-0.6B LLM decoder onto a SenseVoice encoder, then optionally accelerates with vLLM. It’s a rare case where “faster than Whisper on CPU” is a documented, reproducible claim rather than marketing vapor.

Key highlights

One AutoModel.generate() call handles VAD segmentation, transcription, punctuation, and speaker diarization
50+ languages supported; dedicated models for Chinese (Paraformer), 31-language LLM ASR (Fun-ASR-Nano), and 52-language Qwen3-ASR
Built-in emotion recognition and audio event detection via SenseVoice
OpenAI-compatible API server (funasr-server) plus MCP server integration for Claude/Cursor
vLLM backend for 2-3× faster LLM-decoder inference on batched workloads
Docker images and Kubernetes templates provided for deployment

Caveats

The 170× figure is for SenseVoice-Small, not the more accurate LLM-based models; Fun-ASR-Nano drops to 17× realtime on GPU
CPU performance is viable but model-dependent—Whisper-large-v3 is explicitly marked “too slow” on CPU in their comparison table
Documentation and model hub are split between ModelScope (China) and HuggingFace; some links and examples assume ModelScope access

Verdict Worth evaluating if you’re self-hosting ASR at scale, need speaker diarization without chaining pyannote, or want a drop-in Whisper alternative with OpenAI API compatibility. Probably overkill if you’re already happy with cloud transcription or only need occasional English transcription.

Frequently asked

What is modelscope/FunASR?: A Chinese speech toolkit that bundles ASR, diarization, emotion detection, and streaming into one MIT-licensed package.
Is FunASR open source?: Yes — modelscope/FunASR is open source, released under the MIT license.
What language is FunASR written in?: modelscope/FunASR is primarily written in Python.
How popular is FunASR?: modelscope/FunASR has 19.4k stars on GitHub and is currently cooling off.
Where can I find FunASR?: modelscope/FunASR is on GitHub at https://github.com/modelscope/FunASR.