zilliztech/GPTCache
Semantic cache layer for LLM query responses that reduces API costs and improves latency.

Velocity · 7d
+6.9
★ / day
Trend
→steady
star history
GPTCache provides a semantic caching solution for LLM applications, storing and retrieving responses based on meaning rather than exact matches. It integrates with popular LLM frameworks like LangChain and llama_index, using vector similarity search to determine cache hits. The system supports multiple vector stores including Milvus and Redis for scalable deployment, and can serve as a Docker-based server for multi-language environments.