← all repositories

Zefan-Cai/KVCache-Factory

A unified framework implementing multiple KV cache compression methods to accelerate inference for auto-regressive language models.

KVCache-Factory
Velocity · 7d
+1.8
★ / day
Trend
steady
star history

KVCache-Factory provides implementations of various KV cache compression techniques including PyramidKV, SnapKV, H2O, and StreamingLLM. The project targets inference optimization for large language models by reducing memory footprint and computational overhead during autoregressive generation. It supports popular models like LLaMA and Mistral with multi-GPU inference capabilities and integrates with Flash Attention v2 for performance optimization.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.