microsoft/LLMLingua
Microsoft's prompt and KV-Cache compression library for accelerating LLM inference.

Velocity · 7d
+5.9
★ / day
Trend
→steady
star history
LLMLingua is a prompt compression framework that reduces LLM input size by up to 20x while preserving key semantic information. It includes LongLLMLingua for long-context scenarios and LLMLingua-2 for improved compression, along with complementary tools like MInference for fast long-context inference and RetrievalAttention for KV-cache offloading. The project is integrated into Microsoft’s prompt flow ecosystem.