casper-hansen/AutoAWQ
AutoAWQ implements Activation-Aware Weight Quantization for 4-bit model compression with 2x inference speedup.

Velocity · 7d
+2.3
★ / day
Trend
→steady
star history
AutoAWQ is a quantization tool implementing the AWQ algorithm for compressing large language models to 4-bit precision while maintaining accuracy. It reduces memory footprint and accelerates inference by approximately 2x. The project integrates with Hugging Face model hub and has been adopted by the vLLM project as part of their llm-compressor library.