williamyang1991/Rerender_A_Video
A zero-shot text-guided video-to-video translation system using adapted diffusion models with hierarchical cross-frame constraints.

Rerender A Video adapts large text-to-image diffusion models for video domain translation while maintaining temporal consistency. The framework first translates key frames using an adapted diffusion model with cross-frame constraints for shape, texture, and color coherence, then propagates results to full videos via temporal-aware patch matching and frame blending. Implemented in PyTorch, it enables style transfer and content manipulation on videos using natural language prompts without training on target videos.