nv-tlabs/ChronoEdit
A 14-billion parameter diffusion-based image editing system that reframes editing as video generation using start/end frame interpolation with temporal reasoning tokens.

ChronoEdit transforms image editing into a video generation task, using input and edited images as start and end frames to leverage pretrained video diffusion models for temporal consistency. The system introduces a temporal reasoning stage with reasoning tokens to ensure physically plausible edits and visualize the editing trajectory. Built on a 14B parameter diffusion model using the Diffusers framework, it demonstrates AI-powered editing that maintains temporal coherence across the editing process.