← all repositories

ovg-project/kvcached

Virtualized elastic KV cache system enabling dynamic GPU sharing across multiple LLM inference workloads.

kvcached
Velocity · 7d
+2.8
★ / day
Trend
steady
star history

kvcached implements a virtualized elastic KV cache manager that enables flexible GPU sharing for LLM inference workloads. It addresses the GPU memory bottleneck in transformer-based models by multiplexing KV cache resources across concurrent requests and serving jobs. The system integrates with popular inference engines SGLang and vLLM, providing APIs for dynamic cache allocation, sharing, and co-serving strategies to improve GPU utilization in LLM serving environments.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.