FunAudioLLM/FunCineForge
A unified dataset pipeline and MLLM-based model for zero-shot movie dubbing across diverse cinematic scenes.

Fun-CineForge provides an end-to-end pipeline for constructing large-scale dubbing datasets and an MLLM-based model designed for video-to-speech dubbing in movie content. It generates high-quality dubbed audio that matches lip-sync, preserves timbre, and follows instructions across multiple scenes including monologue, narration, dialogue, and multi-speaker scenarios. The project includes the CineDub-CN dataset, a large-scale Chinese television dubbing dataset with rich annotations.