FunAudioLLM/Fun-ASR
An end-to-end speech recognition model supporting 31 languages with real-time transcription, speaker diarization, and multi-accent recognition.

Velocity · 7d
+7.0
★ / day
Trend
→steady
star history
Fun-ASR is an automatic speech recognition (ASR) system developed by Tongyi Lab that converts speech audio to text across 31 languages and dialects. It is trained on tens of millions of hours of real speech data and supports features including real-time low-latency transcription, speaker diarization, hotword enhancement, and timestamp output. The model is available on ModelScope and Hugging Face as Fun-ASR-Nano.