ONNX on a diet: neural nets for devices that hate bloat
A drop-in C99 inference engine that runs on bare metal and still finds time for hardware acceleration.

What it does
Libonnx is a single-file-style C99 library that loads and runs ONNX models without dragging in a runtime the size of a operating system. You allocate a context, point it at a model file (or a baked-in byte array), wire up tensors by name, and call onnx_run(). It targets embedded systems where “portable” usually means “compiles without a C++ compiler or a 500 MB dependency tree.”
The interesting bit
The hardware acceleration hook is the unusual part for something this small. You pass an array of struct resolver_t * at init, and the engine delegates what it can. The rest runs on a pure C implementation. The examples even include a hand-drawn digit recognizer with SDL2 graphics, which is either charming or slightly surreal for a library that could run on a microcontroller.
Key highlights
- Drop-in build:
.cand.hfiles compile with your project, no package manager required - Supports ONNX opset 24 (version 1.17.0) — the supported operator table is documented
- Cross-compilation works via
CROSS_COMPILE=prefix; tested example given forarm64 - Models can be embedded as C arrays via
xxd -i, avoiding filesystem dependencies entirely - Tests pass on MNIST, MobileNet v2, ShuffleNet, SqueezeNet, super-resolution, and Tiny YOLO v2
Caveats
- Not all ONNX operators are implemented; the README warns that tests outside
tests/modelmay fail - The “hardware acceleration support” is a pluggable resolver interface — actual backends are your problem to supply
- SDL2 dependency for the MNIST example is optional but easy to miss in the build instructions
Verdict
Worth a look if you’re shipping ML to a device where Python is a fantasy and megabytes matter. Skip it if you need the full ONNX operator zoo out of the box or already have a comfortable TensorFlow Lite setup.