Deepfake yourself at home, no cloud required
An offline toolkit that clones your face and voice into a digital puppet you drive with text or speech.

What it does Duix.Avatar is a fully offline, Dockerized pipeline for Windows and Ubuntu that turns a video of you into a controllable digital double. Feed it text or audio, and it synthesizes a lip-synced video of your avatar speaking. The stack runs entirely local—no API keys, no data leaving your machine.
The interesting bit The project ships as three containerized services (ASR, voice cloning via Fish Speech, and video generation) orchestrated behind a desktop client. That architecture is what makes the “fully offline” claim practical rather than theoretical. The README also exposes raw HTTP APIs on localhost, so the GUI is optional—scripting-friendly for batch jobs.
Key highlights
- Requires an NVIDIA GPU; recommended bar is an RTX 4070 with 32 GB RAM and ~130 GB free disk space
- Supports eight languages: English, Japanese, Korean, Chinese, French, German, Arabic, Spanish
- Voice cloning uses reference audio + text; video synthesis matches lip movement to generated speech
- “Lite” docker-compose variant available that runs only the video-generation service
- Pre-built client installers for Windows (.exe) and Ubuntu (.AppImage)
Caveats
- Windows install demands very specific disk layout: D: drive for data, C: drive (or manually redirected) for 100+ GB of Docker images
- Initial Docker pull burns ~70 GB of traffic and the README warns it can take half an hour
- NVIDIA 50-series cards need a separate CUDA 12.8 preview path; docs are sparse on whether older cards work without it
- Several API documentation sections are truncated or incomplete in the README (parameter tables cut off mid-example)
Verdict Worth a look if you need private, local avatar generation for content creation or prototyping. Skip it if you were hoping for a lightweight CLI tool—this is a heavy, GUI-first Docker suite with workstation-grade hardware requirements.