← all repositories

intel/neural-compressor

Open-source Python library for compressing LLMs and deep learning models via quantization, pruning, and sparsity across PyTorch, TensorFlow, and ONNX Runtime.

neural-compressor
Velocity · 7d
+1.2
★ / day
Trend
steady
star history

Intel Neural Compressor provides state-of-the-art model compression techniques including low-bit quantization (INT8/FP8/INT4/MXFP8/NVFP4), weight-only quantization, SmoothQuant, pruning, and sparsity. It supports popular LLMs such as LLaMA, Qwen, DeepSeek, and Flux, and integrates with AutoRound for automated quantization tuning. The library targets Intel hardware including Gaudi AI accelerators, Core Ultra processors, and Xeon Scalable processors to optimize model performance and memory footprint during inference.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.