← all repositories
Stability-AI/generative-models

Stability AI's video-to-4D pipeline, now with fewer moving parts

A monorepo of diffusion models that turns flat images into orbiting 3D videos, and videos into 4D assets you can walk around.

generative-models
Velocity · 7d
+25
★ / day
Trend
steady
star history

What it does This is Stability AI’s research release hub for generative video and 3D models. The current star is SV4D 2.0, which takes a short input video of a moving object and generates novel-view videos from multiple camera angles—effectively reconstructing 4D space (3D plus time). The repo also houses SV3D for image-to-3D-orbit synthesis, Stable Video Diffusion for image-to-video, and SDXL-Turbo for fast text-to-image.

The interesting bit SV4D 2.0 drops a dependency that hampered its predecessor: it no longer needs SV3D to generate reference multi-views of the first frame. That makes it more robust to self-occlusions and better at handling real-world videos with messy backgrounds. The trade-off is a hungrier GPU—576×576 resolution, 50 default sampling steps, and autoregressive generation for longer clips.

Key highlights

  • SV4D 2.0 generates 48 frames (12 video frames × 4 views) at 576×576; an 8-view variant exists for different use cases
  • Input can be GIF, MP4, or frame sequences; background removal via rembg, Clipdrop, or SAM2 is recommended for clean results
  • Low-VRAM fallback: set --encoding_t=1 --decoding_t=1 or drop to 512×512 resolution
  • SV3D_p variant accepts custom camera paths via elevation/azimuth degree sequences
  • Includes Streamlit and Gradio demos for local inference

Caveats

  • All models are tagged “for research purposes” with no commercial license mentioned in the README
  • Setup requires Python 3.10, CUDA 11.8, and a dependency pulled from a separate datapipelines repo
  • The 21-frame default input length for SV4D/SV4D 2.0 is arbitrary—scripts autoregressively extend from smaller native chunk sizes

Verdict Grab this if you’re doing novel-view synthesis, 4D reconstruction research, or need a reference implementation of diffusion-based video generation. Skip if you want production-ready APIs or lack VRAM headroom.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.