← all repositories

mit-han-lab/smoothquant

SmoothQuant enables INT8 weight and activation quantization for large language models to reduce memory footprint and accelerate inference.

smoothquant
Velocity · 7d
+1.3
★ / day
Trend
steady
star history

SmoothQuant is a post-training quantization solution that addresses the challenge of quantizing LLMs beyond 100 billion parameters by migrating quantization difficulty from activations to weights. It smooths activation outliers and maintains accuracy while enabling efficient W8A8 quantization. The library integrates with major inference runtimes including NVIDIA TensorRT-LLM, ONNX Runtime, Intel Neural-Compressor, and Amazon SageMaker.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.