Is HunyuanVideo-Foley open source?

Yes — Tencent-Hunyuan/HunyuanVideo-Foley is an open-source project tracked on heatdrop.

What language is HunyuanVideo-Foley written in?

Tencent-Hunyuan/HunyuanVideo-Foley is primarily written in Python.

How popular is HunyuanVideo-Foley?

Tencent-Hunyuan/HunyuanVideo-Foley has 1k stars on GitHub.

Where can I find HunyuanVideo-Foley?

Tencent-Hunyuan/HunyuanVideo-Foley is on GitHub at https://github.com/Tencent-Hunyuan/HunyuanVideo-Foley.

← all repositories

Tencent-Hunyuan/HunyuanVideo-Foley

Foley generation that actually watches the video

End-to-end diffusion model that generates 48kHz sound effects locked to your video by aligning what it sees, reads, and hears.

★1k stars Python Image · Video · Audio

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does

HunyuanVideo-Foley generates 48kHz sound effects matched to video input. Feed it frames plus an optional text prompt, and it runs a diffusion process to produce temporally aligned audio—footsteps, impacts, ambient texture—without manual sound design. The system targets film, game, and short-form video workflows where audio must follow the picture.

The interesting bit

The architecture alternates between multimodal transformer blocks that cross-reference video and audio streams, and unimodal blocks that refine the audio in isolation. A Synchformer-based temporal alignment module with gated modulation keeps the sound locked to frame-level events, while the model attempts to balance visual and textual cues rather than letting one override the other.

Key highlights

Hybrid transformer design: multimodal blocks handle cross-attention between video and audio, while unimodal blocks focus on audio-only refinement
Self-developed 48kHz audio VAE for high-fidelity reconstruction of effects, music, and vocals
Two model tiers: XXL needs roughly 20GB VRAM (12GB with offload), while the newer XL model needs roughly 16GB (8GB with offload)
Benchmark tables show leading scores on MovieGen-Audio-Bench and Kling-Audio-Eval against open-source alternatives including MMAudio and Frieren
Community ComfyUI integrations are already available for node-based workflows

Verdict

Worth a look for video editors, game developers, or generative-media tinkerers who need synchronized sound effects but lack a foley studio. Approach with caution if you are on Windows or macOS, as Linux is the only primary supported platform, and even the smaller XL model demands significant VRAM.

Frequently asked

What is Tencent-Hunyuan/HunyuanVideo-Foley?: End-to-end diffusion model that generates 48kHz sound effects locked to your video by aligning what it sees, reads, and hears.
Is HunyuanVideo-Foley open source?: Yes — Tencent-Hunyuan/HunyuanVideo-Foley is an open-source project tracked on heatdrop.
What language is HunyuanVideo-Foley written in?: Tencent-Hunyuan/HunyuanVideo-Foley is primarily written in Python.
How popular is HunyuanVideo-Foley?: Tencent-Hunyuan/HunyuanVideo-Foley has 1k stars on GitHub.
Where can I find HunyuanVideo-Foley?: Tencent-Hunyuan/HunyuanVideo-Foley is on GitHub at https://github.com/Tencent-Hunyuan/HunyuanVideo-Foley.