intel/auto-round
An advanced quantization toolkit for compressing LLMs and VLMs to 2-4 bits with minimal accuracy loss.

Velocity · 7d
+1.6
★ / day
Trend
→steady
star history
AutoRound employs sign-gradient descent to quantize large language models down to ultra-low bit widths (2-4 bits) while maintaining high accuracy. It provides broad hardware compatibility across CPU, XPU, and CUDA platforms, and integrates seamlessly with popular inference frameworks including vLLM, SGLang, and Transformers for deployment of quantized models.