← all repositories

intel/auto-round

An advanced quantization toolkit for compressing LLMs and VLMs to 2-4 bits with minimal accuracy loss.

auto-round
Velocity · 7d
+1.6
★ / day
Trend
steady
star history

AutoRound employs sign-gradient descent to quantize large language models down to ultra-low bit widths (2-4 bits) while maintaining high accuracy. It provides broad hardware compatibility across CPU, XPU, and CUDA platforms, and integrates seamlessly with popular inference frameworks including vLLM, SGLang, and Transformers for deployment of quantized models.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.