← all repositories

YuanGongND/whisper-at

A joint audio tagging and speech recognition model that extends OpenAI Whisper with audio event detection capability at minimal additional computational cost.

whisper-at
Velocity · 7d
+0.4
★ / day
Trend
steady
star history

Whisper-AT provides pretrained models that perform both speech recognition (with identical performance to original Whisper) and general audio event tagging across 527 AudioSet classes. The model can output audio event labels at various temporal resolutions alongside transcription. It offers a Python package, HuggingFace Space demo, and Google Colab notebook for easy experimentation.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.