← all repositories
MeiGen-AI/InfiniteTalk

Dub any video forever, or make a person talk from a single photo

InfiniteTalk generates unlimited-length lip-synced talking video from either an existing video or a single image, using audio to drive head pose, expression, and body movement.

6.8k stars Python Image · Video · Audio
InfiniteTalk
Velocity · 7d
+23
★ / day
Trend
steady
star history

What it does

InfiniteTalk is an audio-driven video generation system built on top of Wan2.1-I2V-14B. Feed it a video plus an audio track and it re-synthesizes the subject with matched lip sync, head movement, body posture, and facial expressions. Feed it a single image plus audio and it generates a talking video from scratch. The “infinite-length” claim means streaming generation that isn’t hard-capped at a few seconds.

The interesting bit

Most dubbing tools fixate on lips and call it done. InfiniteTalk attempts to sync the whole body to speech rhythm, which is the harder and more noticeable problem. The trade-off is familiar to long-video generation: color drift and identity degradation accumulate over time, and the authors openly note that camera movement matching is approximate unless you accept more drift.

Key highlights

  • Video-to-video and image-to-video modes; 480P and 720P output
  • Built on Wan2.1-I2V-14B with custom audio conditioning weights
  • TeaCache and int8 quantization supported for lower VRAM; multi-GPU inference available
  • Gradio demo and ComfyUI branch provided
  • Community integrations: Wan2GP (low-VRAM optimization) and kijai’s ComfyUI wrapper

Caveats

  • Color shifts worsen after roughly 1 minute in image-to-video mode; the repo suggests a workaround (translate/zoom the static image into a short video) rather than fixing it
  • Camera movement in video-to-video mode is mimicked, not reproduced; SDEdit improves accuracy but introduces its own color shift
  • FusionX LoRA speeds things up but also degrades identity preservation over long clips
  • Inference acceleration (LCM distillation, sparse attention) is still on the todo list

Verdict

Worth a look if you need long-form talking-head generation and can tolerate some manual tuning of CFG scales and workarounds for drift. If you need broadcast-perfect lip sync with locked camera motion out of the box, this isn’t there yet — though the authors are admirably upfront about where the seams show.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.