← all repositories

casper-hansen/AutoAWQ

AutoAWQ implements Activation-Aware Weight Quantization for 4-bit model compression with 2x inference speedup.

2.3k stars Python Inference · Serving
AutoAWQ
Velocity · 7d
+2.3
★ / day
Trend
steady
star history

AutoAWQ is a quantization tool implementing the AWQ algorithm for compressing large language models to 4-bit precision while maintaining accuracy. It reduces memory footprint and accelerates inference by approximately 2x. The project integrates with Hugging Face model hub and has been adopted by the vLLM project as part of their llm-compressor library.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.