← all repositories

NVIDIA/Model-Optimizer

NVIDIA library providing quantization, pruning, NAS, distillation, and speculative decoding to compress and optimize deep learning models for faster inference.

Model-Optimizer
Velocity · 7d
+3.7
★ / day
Trend
steady
star history

Model Optimizer is a unified library of state-of-the-art model optimization techniques including quantization, pruning, Neural Architecture Search, distillation, speculative decoding, and sparsity. It compresses deep learning models and exports optimized checkpoints ready for deployment in downstream inference frameworks like SGLang, TensorRT-LLM, and TensorRT. The library integrates with Hugging Face, PyTorch, ONNX models, and NVIDIA’s Megatron-LM ecosystem.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.