← all repositories

microsoft/SpeechT5

Microsoft's unified-modal speech-text pre-training framework implementing multiple speech processing models including ASR, TTS, speech translation, and speech language models.

SpeechT5
Velocity · 7d
+0.9
★ / day
Trend
steady
star history

SpeechT5 provides pre-training approaches for spoken language processing including SpeechT5 (encoder-decoder pre-training), Speech2C (ASR with unpaired speech), YiTrans (speech translation), SpeechUT (speech-text bridging), and VALL-E X (cross-lingual neural codec language modeling). The repository contains model implementations, evaluation results, and inference instructions for these speech-focused deep learning systems.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.