vllm-project/llm-compressor
LLM Compressor is a Python library that applies quantization and compression algorithms to large language models for optimized vLLM deployment.

Velocity · 7d
+4.7
★ / day
Trend
→steady
star history
The library provides weight, activation, KV Cache, and attention quantization algorithms for compressing LLMs. It integrates with Hugging Face models and saves compressed models in the compressed-tensors format compatible with vLLM. The tool supports DDP and disk offloading for handling very large models during compression.