Picsart-AI-Research/StreamingT2V
An autoregressive video generation model that creates long, temporally consistent videos from text or image prompts by extending Stable Video Diffusion.

This repository implements StreamingSVD, an enhanced autoregressive technique for text-to-video and image-to-video generation. It extends Stable Video Diffusion (SVD) into a high-quality long video generator capable of producing videos up to 200 frames (8 seconds) with rich motion dynamics. The method maintains temporal consistency throughout the video while aligning closely to input text or image prompts. Part of the StreamingT2V research family published at CVPR 2025.