neuralmagic/deepsparse
A sparsity-aware deep learning inference runtime that runs ONNX models on CPUs with optimized performance through pruning and quantization.

DeepSparse is a CPU inference runtime designed to execute deep learning models efficiently by leveraging sparsity techniques. It supports models from ONNX format across computer vision, NLP, and LLM domains. The runtime applies model compression methods including pruning and quantization to reduce computational overhead while maintaining accuracy, enabling faster inference on standard CPU hardware without specialized accelerators.