← all repositories

huggingface/optimum-quanto

A PyTorch quantization backend providing int2/int4/int8/float8 weight and activation quantization with CUDA acceleration for optimized model inference.

optimum-quanto
Velocity · 7d
+1.0
★ / day
Trend
steady
star history

Optimum Quanto is a quantization library for PyTorch models that enables dynamic and static quantization with automatic stub insertion for quantized operations and modules. It accelerates matrix multiplications on CUDA devices and supports serialization compatible with PyTorch weight_only and safetensors formats, making it useful for optimizing LLM and transformer model inference while reducing memory footprint.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.