Is xiaozhi-esp32 open source?

Yes — 78/xiaozhi-esp32 is open source, released under the MIT license.

What language is xiaozhi-esp32 written in?

78/xiaozhi-esp32 is primarily written in C++.

How popular is xiaozhi-esp32?

78/xiaozhi-esp32 has 28.3k stars on GitHub and is currently cooling off.

Where can I find xiaozhi-esp32?

78/xiaozhi-esp32 is on GitHub at https://github.com/78/xiaozhi-esp32.

← all repositories

78/xiaozhi-esp32

ESP32 voice assistant hands your LLM the light switch

XiaoZhi is open-source ESP32 firmware that streams voice to LLMs like Qwen and DeepSeek, then uses the MCP protocol to let those models control your LEDs, servos, and smart home gear.

★28.3k stars C++ Agents Chat Assistants

View on GitHub ↗ Homepage ↗

Velocity · 7d

+29

★ / day

Trend

↘cooling

star history

What it does

XiaoZhi is an ESP32 firmware project that turns cheap microcontroller dev boards into voice-interaction endpoints. It runs a streaming pipeline of automatic speech recognition, large language model inference, and text-to-speech, so you can talk to models like Qwen or DeepSeek through a small gadget. The firmware connects to the official xiaozhi.me server by default—personal users get free access to the Qwen real-time model—but the project also supports self-hosted backends and a growing ecosystem of alternative clients and servers.

The interesting bit

The unusual angle is the use of MCP on both sides of the wire: device-side MCP exposes local hardware—speakers, LEDs, servos, GPIO—to the LLM, while cloud-side MCP extends the model’s reach to smart home devices, PC desktops, email, and knowledge search. That means the same voice conversation can end with the AI dimming your lights or checking your calendar, not just chatting. It is a rare attempt to run an MCP client on a resource-constrained microcontroller.

Key highlights

Supports 70+ off-the-shelf ESP32 boards, from the M5Stack CoreS3 to breadboard DIY setups
Offline wake-word detection via ESP-SR, with OPUS-compressed audio streaming over WebSocket or MQTT+UDP
Speaker recognition through 3D Speaker to identify who is talking
Device-side MCP controls local peripherals; cloud-side MCP hooks into smart home and desktop automation
Free tier on the official xiaozhi.me server; MIT licensed, including for commercial use

Caveats

v2 firmware uses a new partition table that is incompatible with v1, so over-the-air upgrades from v1 are impossible; you must manually reflash
The maintainers note that Linux builds compile faster and with fewer driver issues than Windows
Heavy LLM inference and cloud integrations rely on the official server or a self-hosted backend; the ESP32 itself acts as the voice terminal, not the brain

Verdict

Hardware hackers and embedded developers who want a ready-made voice AI stack for ESP32 boards should grab this. If you need a fully offline LLM running inside the microcontroller with no network dependency, look elsewhere.

Frequently asked

What is 78/xiaozhi-esp32?: XiaoZhi is open-source ESP32 firmware that streams voice to LLMs like Qwen and DeepSeek, then uses the MCP protocol to let those models control your LEDs, servos, and smart home gear.
Is xiaozhi-esp32 open source?: Yes — 78/xiaozhi-esp32 is open source, released under the MIT license.
What language is xiaozhi-esp32 written in?: 78/xiaozhi-esp32 is primarily written in C++.
How popular is xiaozhi-esp32?: 78/xiaozhi-esp32 has 28.3k stars on GitHub and is currently cooling off.
Where can I find xiaozhi-esp32?: 78/xiaozhi-esp32 is on GitHub at https://github.com/78/xiaozhi-esp32.