kvcache-ai/Mooncake
A KVCache-centric disaggregated serving platform for large language model inference, developed by Moonshot AI for their Kimi LLM service.

Velocity · 7d
+7.7
★ / day
Trend
→steady
star history
Mooncake implements a distributed architecture that separates KVCache management from model computation for improved LLM serving efficiency. It leverages RDMA for high-speed KVCache transfer and supports integration with vLLM and SGLang frameworks. The system enables disaggregated prefill and decode phases, optimizing throughput and token speed for LLM inference workloads.