vitoplantamura/OnnxStream
Lightweight C++ inference library for ONNX models that runs Stable Diffusion and LLMs like Mistral 7B on devices ranging from Raspberry Pi Zero 2 to servers.

OnnxStream is a lightweight C++ inference library for ONNX models that achieves efficiency by avoiding memory-hungry intermediate tensor operations. It leverages XNNPACK for hardware acceleration and supports running Stable Diffusion XL, LLMs including Mistral 7B and TinyLlama, YOLOv8 object detection, and OpenAI Whisper speech recognition across diverse platforms including ARM, x86, WebAssembly, and RISC-V. The library can operate within 298MB of RAM and provides Python, C#, and JavaScript/WASM bindings.