← all repositories
met4citizen/TalkingHead

Browser avatars that actually move their lips on time

A JavaScript class for real-time lip-sync with full-body 3D avatars, built on Three.js and used in everything from AI dating profiles to Twitch adventures.

TalkingHead
Velocity · 7d
+1.3
★ / day
Trend
steady
star history

What it does

TalkingHead is a browser-based JavaScript class that drops a 3D avatar into a webpage and makes it speak with real-time lip-sync. It handles full-body GLB avatars, Mixamo FBX animations, and can translate emojis into facial expressions. The rendering is plain Three.js/WebGL — no magic, just geometry moving on cue.

The interesting bit

The project has accumulated a genuinely weird and impressive portfolio of real-world uses: MIT/Harvard dating-profile digital twins, a Cannes-featured Twitch game, quantum physics lectures, and cancer clinical trial recruitment. The author seems mildly surprised by this themselves. The lip-sync engine is modular — five built-in languages (English, German, French, Finnish, Lithuanian), but you can plug in Microsoft Azure for 100+ languages or bypass text entirely with the companion HeadAudio module for audio-driven visemes.

Key highlights

  • Real-time lip-sync from TTS word-level timestamps or direct viseme/blend-shape data
  • Supports Google Cloud TTS by default; ElevenLabs, Azure, and in-browser Kokoro via HeadTTS add-on
  • Companion modules: HeadTTS (free neural TTS with WebGPU), HeadAudio (audio-driven lip-sync without transcription), MotionEngine (LLM-driven gestures)
  • Dynamic bones and built-in physics for hair/clothing rigged avatars
  • Minimal hobbyist example: single HTML file, add your Google API key, done

Caveats

  • Avatars need a Mixamo-compatible rig plus ARKit and Oculus viseme blend shapes — not a drop-in-any-model situation
  • The README warns against putting Google TTS API keys in client-side code, then immediately offers a minimal example that does exactly that; production use requires JWT/proxy setup
  • Default language is Finnish ("fi-FI"), which is charming but may confuse first-time users

Verdict

Grab this if you’re building browser-based AI interfaces, virtual presenters, or interactive characters and need lip-sync without Unity/Unreal overhead. Skip it if you want plug-and-play with arbitrary 3D models or need native mobile performance.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.