Is DiffSinger open source?

Yes — MoonInTheRiver/DiffSinger is open source, released under the MIT license.

What language is DiffSinger written in?

MoonInTheRiver/DiffSinger is primarily written in Python.

How popular is DiffSinger?

MoonInTheRiver/DiffSinger has 4.8k stars on GitHub.

Where can I find DiffSinger?

MoonInTheRiver/DiffSinger is on GitHub at https://github.com/MoonInTheRiver/DiffSinger.

← all repositories

MoonInTheRiver/DiffSinger

Singing synthesis that skips the deep diffusion

It generates singing and speech by running diffusion only shallowly, trading deep iterative refinement for speed.

★4.8k stars Python Image · Video · Audio

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

DiffSinger is the official PyTorch release of an AAAI-2022 paper that synthesizes singing and speech through a shallow diffusion mechanism. It converts lyrics paired with MIDI or pitch contours into mel spectrograms, then hands them off to vocoders like NSF-HiFiGAN for audio output. The repo also ships DiffSpeech, a sibling pipeline for plain text-to-speech.

The interesting bit

The shallow diffusion trick is essentially a speed hack: it stops the iterative refinement early rather than running the full deep denoising chain, and a later plug-in added PNDM sampling to go even faster. The authors released their own PopCS dataset for singing research, though they candidly note that NATSpeech is now their improved framework.

Key highlights

Official AAAI-2022 implementation covering both singing-voice synthesis (SVS) and text-to-speech (TTS).
Multiple pipeline flavors: ground-truth F0, MIDI-based pitch, and end-to-end lyric-to-mel without explicit pitch prediction.
PNDM acceleration plug-in (ICLR 2022) layered on top of the base shallow diffusion.
PopCS dataset released by the authors; interactive demos hosted on Hugging Face for both TTS and SVS.
Publicly acknowledges NATSpeech as the successor framework for this line of work.

Caveats

The authors steer users toward NATSpeech for ongoing work, implying this repo is largely a frozen reference implementation.
Environment setup is split across GPU-specific requirement files (requirements_2080.txt, requirements_3090.txt), which feels brittle.
The README is heavy on changelog and pipeline tables but light on architecture or training guidance.

Verdict

Audio researchers hunting for an official SVS/TTS diffusion baseline or the PopCS dataset will find this useful. Developers looking for actively maintained, production-polished code should follow the authors’ own advice and head to NATSpeech instead.

Frequently asked

What is MoonInTheRiver/DiffSinger?: It generates singing and speech by running diffusion only shallowly, trading deep iterative refinement for speed.
Is DiffSinger open source?: Yes — MoonInTheRiver/DiffSinger is open source, released under the MIT license.
What language is DiffSinger written in?: MoonInTheRiver/DiffSinger is primarily written in Python.
How popular is DiffSinger?: MoonInTheRiver/DiffSinger has 4.8k stars on GitHub.
Where can I find DiffSinger?: MoonInTheRiver/DiffSinger is on GitHub at https://github.com/MoonInTheRiver/DiffSinger.