← all repositories

neuralmagic/deepsparse

A sparsity-aware deep learning inference runtime that runs ONNX models on CPUs with optimized performance through pruning and quantization.

deepsparse
Velocity · 7d
+1.6
★ / day
Trend
steady
star history

DeepSparse is a CPU inference runtime designed to execute deep learning models efficiently by leveraging sparsity techniques. It supports models from ONNX format across computer vision, NLP, and LLM domains. The runtime applies model compression methods including pruning and quantization to reduce computational overhead while maintaining accuracy, enabling faster inference on standard CPU hardware without specialized accelerators.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.