FunAudioLLM/SenseVoice
A multilingual speech understanding foundation model supporting ASR, emotion recognition, and audio event detection across 50+ languages.

Velocity · 7d
+12
★ / day
Trend
→steady
star history
SenseVoice is a speech foundation model providing automatic speech recognition, spoken language identification, speech emotion recognition, and audio event detection capabilities. It employs a non-autoregressive end-to-end architecture trained on over 400,000 hours of data to achieve low inference latency while supporting 50+ languages. The model is implemented in PyTorch and available through ModelScope and Hugging Face.