← all repositories

hhhuang/CAG

Cache-Augmented Generation (CAG) uses preloaded KV-caches to eliminate real-time retrieval in LLM inference.

CAG
Velocity · 7d
+2.7
★ / day
Trend
steady
star history

This repository implements CAG, a retrieval-free alternative to RAG that leverages extended LLM context windows and cached runtime parameters. Instead of querying a vector database at inference time, all relevant knowledge is preloaded into context, enabling direct generation without retrieval latency. The approach was published at ACM Web Conference 2025 as a short paper investigating the relationship between model performance and context length.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.