Is dreamtalk open source?

Yes — ali-vilab/dreamtalk is open source, released under the MIT license.

What language is dreamtalk written in?

ali-vilab/dreamtalk is primarily written in Python.

How popular is dreamtalk?

ali-vilab/dreamtalk has 1.8k stars on GitHub.

Where can I find dreamtalk?

ali-vilab/dreamtalk is on GitHub at https://github.com/ali-vilab/dreamtalk.

← all repositories

ali-vilab/dreamtalk

Diffusion models animate portraits with audio and attitude

DreamTalk uses diffusion models to animate a still portrait with any audio clip, controlling the resulting expression and head pose via 3DMM parameters.

★1.8k stars Python Image · Video · Audio

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does

DreamTalk takes a single source portrait and an audio file—songs, speech, or noisy recordings—and generates a lip-synced talking head video. It controls the result through 3DMM parameters for speaking style and head pose, which must be extracted from reference videos using external tools like PIRenderer. The output is natively cropped to 256×256, with optional super-resolution post-processing available through separate modules.

The interesting bit

Instead of deterministic regression, it treats talking head generation as a diffusion process, which the authors argue produces more vivid expressions across diverse styles. The classifier-free guidance scale lets you dial the intensity of the speaking style up or down. It is also notably cautious: the authors gated the pretrained checkpoints behind an academic email request, citing social impact.

Key highlights

Handles songs, multilingual speech, noisy audio, and out-of-domain portraits.
Speaking style and head pose are decoupled and driven by reference 3DMM sequences.
Runs on CPU as well as GPU.
Optional upsampling to 512×512 or 1024×1024 via MetaPortrait or CodeFormer, though both trade emotion intensity for pixels.
Released for research and non-commercial use only; checkpoints require an academic email request.

Caveats

Native resolution is 256×256; high-resolution output relies on slow, ad-hoc super-resolution modules that can dampen facial emotions and introduce temporal inconsistency.
Checkpoints are no longer publicly downloadable; you must email the authors and agree to academic-only use.
Style and pose references require preprocessing with external repositories (PIRenderer, FOMM) to extract 3DMM parameters at exactly 25 FPS.

Verdict

Researchers working on audio-driven animation or diffusion-based face generation should take a look. If you need a turnkey, high-resolution commercial avatar pipeline, this is explicitly not it.

Frequently asked

What is ali-vilab/dreamtalk?: DreamTalk uses diffusion models to animate a still portrait with any audio clip, controlling the resulting expression and head pose via 3DMM parameters.
Is dreamtalk open source?: Yes — ali-vilab/dreamtalk is open source, released under the MIT license.
What language is dreamtalk written in?: ali-vilab/dreamtalk is primarily written in Python.
How popular is dreamtalk?: ali-vilab/dreamtalk has 1.8k stars on GitHub.
Where can I find dreamtalk?: ali-vilab/dreamtalk is on GitHub at https://github.com/ali-vilab/dreamtalk.