← all repositories
PKU-YuanGroup/Helios

A 14B video model that outruns 1.3B rivals by ignoring the playbook

Helios generates minute-scale videos at 19.5 FPS on one H100 by deliberately skipping every standard acceleration and anti-drift trick in the book.

1.9k stars Python Image · Video · Audio
Helios
Velocity · 7d
+19
★ / day
Trend
steady
star history

What it does

Helios is a 14B-parameter diffusion model for text-to-video, image-to-video, video-to-video, and interactive generation. It synthesizes minute-long videos at 19.5 FPS end-to-end on a single H100 (about 10 FPS on Ascend NPU), and the authors claim it outperforms smaller 1.3B models in quality while doing so.

The interesting bit

The model achieves this speed without KV-cache, causal masking, sparse attention, TinyVAE, quantization, or any conventional anti-drifting strategy like keyframe sampling or error-banks. The authors frame this as a feature, not a bug: they found optimizations that improve throughput and cut memory enough to fit four 14B models in 80 GB of VRAM, running at image-diffusion-scale batch sizes during training.

Key highlights

  • Three model variants: Helios-Base (best quality, v-prediction), Helios-Mid (intermediate checkpoint with CFG-Zero*), and Helios-Distilled (best efficiency, x0-prediction with custom DMD scheduler)
  • Day-0 inference support across Diffusers, SGLang-Diffusion, vLLM-Omni, and Ascend NPU
  • VRAM can squeeze down to ~6 GB with Group Offloading; multi-GPU inference via Ulysses/Ring/Unified Attention context parallelism
  • Community-tested up to 20.89 FPS on tuned H100 hardware
  • Gradio demo and AOTI-compiled HuggingFace Spaces available

Caveats

  • Image-to-Video and Video-to-Video are noted as “slightly inferior” to Text-to-Video because training was T2V-first; the README suggests workarounds like is_skip_first_chunk and noise-sigma tuning
  • Helios-Mid is explicitly flagged as an intermediate distillation checkpoint that “may not meet expected quality”
  • Real-time performance depends heavily on CPU, system memory, and CUDA driver version, not just GPU

Verdict

Worth a look if you’re building video generation pipelines and skeptical that bigger always means slower. Skip it if you need polished I2V/V2V out of the box without parameter tweaking.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.