Keras on a chip: neural networks for the malloc-and-pray crowd
A C library that lets you deploy Keras models to microcontrollers without rewriting your network by hand.

What it does
NNoM is a C-based inference engine for microcontrollers that takes a trained Keras model and, with one line of Python, spits out a deployable C header with weights and structure. It handles the memory layout, layer wiring, and quantization so you don’t have to manually translate your ResNet or LSTM into pointer arithmetic.
The interesting bit
The “pre-compiling” approach: instead of shipping an interpreter that parses a model graph at runtime, NNoM generates static C code upfront. That means zero overhead from model parsing on the device—your MCU just runs compiled C. The trade-off is flexibility; you re-run the Python converter when the model changes.
Key highlights
- One-line Keras-to-C conversion via Python scripts (
nnom.pyfor the newer structured interface) - Supports nontrivial architectures: Inception, ResNet, DenseNet, plus RNN variants (GRU, LSTM) as of v0.4.1
- Per-channel quantization and dilated convolutions in the structured API
- Optional CMSIS-NN/DSP backend for ARM Cortex-M4/7/33/35P (up to ~5× speedup over pure C)
- On-device evaluation tools: runtime profiling, top-k accuracy, confusion matrices
- No external C dependencies beyond
libcfor memory management
Caveats
- The Keras converter chokes on implicitly defined activations (
Dense(32, activation="relu")); you must split them into explicit layers - TensorFlow requirement is pinned to ≤2.14, which caps Python at 3.11
- Several RNN layers and ConvTransposed are marked “Alpha” or “Under Dev.”
- Some operations don’t support both HWC and CHW formats; the README flags this as a current limitation
Verdict
Worth a look if you’re shipping Keras models to ARM MCUs and tired of hand-rolling C. Skip it if you need runtime model swapping, work outside the Cortex-M ecosystem, or can’t stomach the Keras-to-C conversion step.