Unity SDK turns VRM models into voice-chat AI companions
A C# toolkit that wires lip-sync, facial expressions, and speech recognition to multiple LLMs so your 3D character can actually hold a conversation.

What it does ChatdollKit is a Unity SDK for building voice-enabled 3D chatbots from VRM models. It handles the full conversational loop: speech-to-text, LLM inference, text-to-speech, and synchronized animations including lip-sync, blinking, and facial expressions. It targets desktop, mobile, VR/AR, and WebGL.
The interesting bit The project treats conversation as a real-time performance problem, not just an API call. Recent releases added WebSocket streaming STT to shave “several hundred milliseconds” off latency, barge-in support so users can interrupt mid-sentence, and multi-VAD noise resistance for event venues. It also runs entirely in WebGL with JavaScript interop.
Key highlights
- Pluggable LLMs: OpenAI, Anthropic Claude, Google Gemini, Grok, Dify, plus function calling and multimodal inputs
- Broad TTS/STT support: Azure, Google, OpenAI, VOICEVOX, AivisSpeech, Style-Bert-VITS2, NijiVoice, with TTS preprocessing for pronunciation tuning
- 3D expression system: autonomous animation, face control, idle behaviors, runtime VRM model switching
- Conversation management: wake words, intent routing, context state, long-term memory via ChatMemory/mem0/Zep, dynamic multilingual switching
- External control: socket commands, JavaScript control in WebGL, remote client support for VTuber-style setups
Caveats
- Setup is multi-step: import dependencies, configure scene objects, attach API keys to three separate inspector components just to run the demo
- README notes legacy component removal in 0.8.4 and refers to a separate migration guide for 0.7.x users
- Some features (AIAvatarKit backend, AutoGen integration) are mentioned but not deeply documented in the visible README sections
Verdict Worth a look if you’re building interactive 3D characters, AI VTubers, or kiosk-style virtual agents in Unity. Skip it if you need a hosted, no-code solution or aren’t already in the Unity ecosystem.