← all repositories

kvcache-ai/Mooncake

A KVCache-centric disaggregated serving platform for large language model inference, developed by Moonshot AI for their Kimi LLM service.

Mooncake
Velocity · 7d
+7.7
★ / day
Trend
steady
star history

Mooncake implements a distributed architecture that separates KVCache management from model computation for improved LLM serving efficiency. It leverages RDMA for high-speed KVCache transfer and supports integration with vLLM and SGLang frameworks. The system enables disaggregated prefill and decode phases, optimizing throughput and token speed for LLM inference workloads.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.