MoonInTheRiver/DiffSinger
PyTorch implementation of a diffusion-based neural network for generating singing voice and speech from text.

Velocity · 7d
+2.9
★ / day
Trend
→steady
star history
DiffSinger generates singing voice and text-to-speech output by leveraging a shallow diffusion mechanism. The model takes text and musical notation (MIDI) as input and produces audio waveforms of sung vocals. It uses a parameterized noise forecasting network to progressively refine latent representations until natural speech emerges.