← all repositories

microsoft/LLMLingua

Microsoft's prompt and KV-Cache compression library for accelerating LLM inference.

LLMLingua
Velocity · 7d
+5.9
★ / day
Trend
steady
star history

LLMLingua is a prompt compression framework that reduces LLM input size by up to 20x while preserving key semantic information. It includes LongLLMLingua for long-context scenarios and LLMLingua-2 for improved compression, along with complementary tools like MInference for fast long-context inference and RetrievalAttention for KV-cache offloading. The project is integrated into Microsoft’s prompt flow ecosystem.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.