ovg-project/kvcached
Virtualized elastic KV cache system enabling dynamic GPU sharing across multiple LLM inference workloads.

kvcached implements a virtualized elastic KV cache manager that enables flexible GPU sharing for LLM inference workloads. It addresses the GPU memory bottleneck in transformer-based models by multiplexing KV cache resources across concurrent requests and serving jobs. The system integrates with popular inference engines SGLang and vLLM, providing APIs for dynamic cache allocation, sharing, and co-serving strategies to improve GPU utilization in LLM serving environments.