← all repositories

vllm-project/llm-compressor

LLM Compressor is a Python library that applies quantization and compression algorithms to large language models for optimized vLLM deployment.

llm-compressor
Velocity · 7d
+4.7
★ / day
Trend
steady
star history

The library provides weight, activation, KV Cache, and attention quantization algorithms for compressing LLMs. It integrates with Hugging Face models and saves compressed models in the compressed-tensors format compatible with vLLM. The tool supports DDP and disk offloading for handling very large models during compression.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.