uezo/aiavatarkit
A modular speech-to-speech framework for building AI-powered conversational avatars with support for multiple LLMs and voice synthesis engines.

AIAvatarKit provides a unified backend for real-time conversational AI systems with multimodal input/output capabilities. It integrates speech recognition, large language models, and text-to-speech into a modular architecture supporting VOICEVOX, OpenAI, Google, Azure, and other services. The framework includes built-in voice activity detection, supports dynamic tool calls for agentic workflows, and can run standalone via WebSocket/HTTP or integrate with metaverse platforms like VRChat.