← all repositories

MoonInTheRiver/DiffSinger

PyTorch implementation of a diffusion-based neural network for generating singing voice and speech from text.

4.8k stars Python Image · Video · Audio
DiffSinger
Velocity · 7d
+2.9
★ / day
Trend
steady
star history

DiffSinger generates singing voice and text-to-speech output by leveraging a shallow diffusion mechanism. The model takes text and musical notation (MIDI) as input and produces audio waveforms of sung vocals. It uses a parameterized noise forecasting network to progressively refine latent representations until natural speech emerges.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.