Stability AI's video-to-4D pipeline, now with fewer moving parts
A monorepo of diffusion models that turns flat images into orbiting 3D videos, and videos into 4D assets you can walk around.

What it does This is Stability AI’s research release hub for generative video and 3D models. The current star is SV4D 2.0, which takes a short input video of a moving object and generates novel-view videos from multiple camera angles—effectively reconstructing 4D space (3D plus time). The repo also houses SV3D for image-to-3D-orbit synthesis, Stable Video Diffusion for image-to-video, and SDXL-Turbo for fast text-to-image.
The interesting bit SV4D 2.0 drops a dependency that hampered its predecessor: it no longer needs SV3D to generate reference multi-views of the first frame. That makes it more robust to self-occlusions and better at handling real-world videos with messy backgrounds. The trade-off is a hungrier GPU—576×576 resolution, 50 default sampling steps, and autoregressive generation for longer clips.
Key highlights
- SV4D 2.0 generates 48 frames (12 video frames × 4 views) at 576×576; an 8-view variant exists for different use cases
- Input can be GIF, MP4, or frame sequences; background removal via
rembg, Clipdrop, or SAM2 is recommended for clean results - Low-VRAM fallback: set
--encoding_t=1 --decoding_t=1or drop to 512×512 resolution - SV3D_p variant accepts custom camera paths via elevation/azimuth degree sequences
- Includes Streamlit and Gradio demos for local inference
Caveats
- All models are tagged “for research purposes” with no commercial license mentioned in the README
- Setup requires Python 3.10, CUDA 11.8, and a dependency pulled from a separate
datapipelinesrepo - The 21-frame default input length for SV4D/SV4D 2.0 is arbitrary—scripts autoregressively extend from smaller native chunk sizes
Verdict Grab this if you’re doing novel-view synthesis, 4D reconstruction research, or need a reference implementation of diffusion-based video generation. Skip if you want production-ready APIs or lack VRAM headroom.