Is SoulX-LiveAct open source?

Yes — Soul-AILab/SoulX-LiveAct is an open-source project tracked on heatdrop.

What language is SoulX-LiveAct written in?

Soul-AILab/SoulX-LiveAct is primarily written in Python.

How popular is SoulX-LiveAct?

Soul-AILab/SoulX-LiveAct has 1.1k stars on GitHub.

Where can I find SoulX-LiveAct?

Soul-AILab/SoulX-LiveAct is on GitHub at https://github.com/Soul-AILab/SoulX-LiveAct.

← all repositories

Soul-AILab/SoulX-LiveAct

Real-time talking heads that don't drown in KV cache

An inference framework that streams hour-long, multimodal human animation in real time by keeping diffusion steps consistent and KV memory constant.

★1.1k stars Python Image · Video · Audio

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does SoulX-LiveAct is an inference engine for real-time, multimodal human animation. Feed it audio—processed through chinese-wav2vec2-base—plus action or emotion prompts, and it streams lifelike talking-head video at 20 FPS on dual H100/H200 GPUs, or 6 FPS on a single RTX 5090 with FP8 KV caching and CPU offloading. It is built for long-form streams: podcasts, FaceTime-style calls, and talk shows that run for hours rather than seconds.

The interesting bit The authors attack the memory cliff of autoregressive video diffusion with two moves. Neighbor Forcing treats diffusion-step-aligned neighbor latents as an inductive bias, giving the AR model a theoretically grounded way to keep steps consistent. ConvKV Memory is a lightweight plug-in that compresses the key-value cache so generation stays in constant memory regardless of stream length, with what the authors call negligible overhead.

Key highlights

20 FPS at 720×416 or 512×512 on two H100/H200s, using end-to-end adaptive FP8, sequence parallelism, and fused operators
Consumer GPU fallback: single RTX 4090 or 5090 via FP8 KV cache, block offloading to CPU, and T5-on-CPU tricks
Multimodal control via JSON condition files: audio-driven lipsync plus optional action and emotion editing
GUI demo included for local latency benchmarking
Fresh FP4 GEMM support for NVIDIA B-series cards (RTX 5090, B100, B200)

Caveats

Training code is still unchecked on the open-source roadmap; only inference weights and generation scripts are available
The README notes that FP8 KV cache can “slightly affect generation quality,” and the GUI needs a warm-up run before hitting its advertised steady-state FPS
A CLI argument is documented as --steam_audio, which may or may not reveal how the pipeline feels under load

Verdict Grab it if you are building real-time avatars, virtual hosts, or streaming digital humans and need hour-long coherence. Skip it if you want to train your own variant—the weights are live, but the training recipe is not.

Frequently asked

What is Soul-AILab/SoulX-LiveAct?: An inference framework that streams hour-long, multimodal human animation in real time by keeping diffusion steps consistent and KV memory constant.
Is SoulX-LiveAct open source?: Yes — Soul-AILab/SoulX-LiveAct is an open-source project tracked on heatdrop.
What language is SoulX-LiveAct written in?: Soul-AILab/SoulX-LiveAct is primarily written in Python.
How popular is SoulX-LiveAct?: Soul-AILab/SoulX-LiveAct has 1.1k stars on GitHub.
Where can I find SoulX-LiveAct?: Soul-AILab/SoulX-LiveAct is on GitHub at https://github.com/Soul-AILab/SoulX-LiveAct.