PKU-YuanGroup/ConsisID
ConsisID is a diffusion-based text-to-video generation system that preserves facial identity consistency across generated video sequences using frequency decomposition.

ConsisID generates identity-preserving videos from text prompts by decomposing features across frequency domains to maintain facial consistency throughout generated videos. The system leverages diffusion models as its core generative architecture to produce coherent video sequences with faithful identity representation. This approach addresses a key challenge in video generation—maintaining subject identity across temporal frames—through technical innovations in how visual features are decomposed and processed.