← all repositories
openvinotoolkit/nncf

Shrink neural nets without the tears

NNCF compresses PyTorch, ONNX, and OpenVINO models for faster inference with a small calibration set and optional fine-tuning.

nncf
Velocity · 7d
+0.5
★ / day
Trend
steady
star history

What it does NNCF is Intel’s compression toolkit for neural networks. Feed it a model and a small calibration dataset (~300 samples) and it spits out a quantized or pruned version tuned for OpenVINO inference. It handles post-training quantization, weights compression, and training-time methods like QAT and pruning across PyTorch, TorchFX, ONNX, and OpenVINO backends.

The interesting bit The framework treats compression as a configurable graph transformation rather than a pile of manual hacks. It also preserves full PyTorch training semantics for fine-tuning — you can save and resume checkpoints that include both model weights and NNCF’s internal quantization state, which is the kind of detail that saves you a week of debugging.

Key highlights

  • Post-training 8-bit quantization with minimal code: nncf.quantize(model, calibration_dataset)
  • Training-time algorithms: quantization-aware training, weight-only QAT with LoRA, and structured pruning
  • GPU-accelerated custom layers for faster compressed-model fine-tuning
  • Distributed training support and a Hugging Face Transformers integration patch
  • Export compressed PyTorch models directly to ONNX or OpenVINO-ready formats

Caveats

  • TorchFX and activation sparsity are marked experimental; OpenVINO is the preferred PTQ backend
  • Training-time compression is PyTorch-only — no ONNX or OpenVINO equivalent

Verdict Worth a look if you’re already in the OpenVINO ecosystem or need a single toolkit that spans post-training and training-time compression. Skip it if you need mature, framework-agnostic training-time methods or if your stack is TensorFlow-first (despite the topic tag, TF support appears absent from the current README).

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.