Is VideoChat open source?

Yes — Henry-23/VideoChat is open source, released under the MIT license.

What language is VideoChat written in?

Henry-23/VideoChat is primarily written in Python.

How popular is VideoChat?

Henry-23/VideoChat has 1.3k stars on GitHub.

Where can I find VideoChat?

Henry-23/VideoChat is on GitHub at https://github.com/Henry-23/VideoChat.

← all repositories

Henry-23/VideoChat

A talking-head rig built from China’s open-source model buffet

VideoChat is plumbing that wires FunASR, Qwen, and MuseTalk into a real-time, voice-cloned digital human you can run on one GPU.

★1.3k stars Python Image · Video · Audio Language Models Agents

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does This Gradio demo chains together several existing Chinese open-source models to produce a conversational avatar. It listens through FunASR, reasons with Qwen, speaks through GPT-SoVITS or CosyVoice, and lip-syncs via MuseTalk. Users can upload custom avatar videos and clone new voices from a few seconds of reference audio.

The interesting bit The README benchmarks both pipelines on a single A100: the cascade route needs about 8 GB of VRAM with a roughly 3-second initial delay, while the end-to-end multimodal path demands around 20 GB and about 7 seconds. The cascade_only branch exists for a reason—most mortals will want the lighter option.

Key highlights

Dual pipeline modes: cascade (ASR → LLM → TTS → talking head) and end-to-end (MLLM → talking head)
Voice cloning through GPT-SoVITS with 3–10 second reference clips
Custom avatar support by adding a video file and editing a Python list
Optional API fallback to Aliyun DashScope for LLM and TTS when local hardware is thin
Fully offline capable if you download weights and avoid the API routes

Caveats

End-to-end mode requires roughly 20 GB VRAM on an A100 and still incurs a 7-second initial delay, so “real-time” is relative
Gradio’s video stream is known to lag on the right-hand panel, so smooth playback is not guaranteed
Switching between local and API inference requires manual code edits and scattered weight downloads

Verdict Worth a look if you need a Chinese-language talking-head prototype and would rather not write the integration glue yourself. Look elsewhere if you need a lightweight, polished product or lack the GPU headroom to host both a large language model and a diffusion lip-sync model at once.

Frequently asked

What is Henry-23/VideoChat?: VideoChat is plumbing that wires FunASR, Qwen, and MuseTalk into a real-time, voice-cloned digital human you can run on one GPU.
Is VideoChat open source?: Yes — Henry-23/VideoChat is open source, released under the MIT license.
What language is VideoChat written in?: Henry-23/VideoChat is primarily written in Python.
How popular is VideoChat?: Henry-23/VideoChat has 1.3k stars on GitHub.
Where can I find VideoChat?: Henry-23/VideoChat is on GitHub at https://github.com/Henry-23/VideoChat.