Keras models in C++ without the TensorFlow tax
A header-only library that lets you run Keras inference in C++ by reimplementing just the prediction ops you actually need.

What it does
frugally-deep lets you train a model in Python/Keras, convert it to a JSON format, then load and run predict() in C++ without linking against TensorFlow. It’s header-only, pulls in only other header-only dependencies (FunctionalPlus, Eigen, nlohmann/json), and explicitly targets smaller binaries and simpler deployment.
The interesting bit
The library reimplements a “small subset of TensorFlow” — just the ops needed for inference — and deliberately skips GPU support entirely, running single-core CPU predictions. The author notes you can parallelize across cores at the application level if throughput matters. It also avoids materializing the im2col matrix during convolutions, which saves RAM at the cost of that particular optimization.
Key highlights
- Supports a wide layer zoo: standard conv/pool/RNN layers, attention mechanisms, normalization variants, and even training-only augmentation layers (passed through at inference)
- Handles non-sequential models: multiple inputs/outputs, residual connections, shared layers, nested models, variable input shapes
- Includes a validation step:
convert_model.pyauto-generates a test case, andload_modelverifies C++ output matches Keras output - Works in 32-bit executables; C++14 minimum
- Custom layers supported via factory functions passed to
load_model
Caveats
- No GPU support, no multi-core within a single prediction (by design)
Lambdalayers, stateful RNNs, and several preprocessing layers (Hashing,TextVectorization,MelSpectrogram, etc.) are explicitly unsupported- Requires
channels_lastimage data format; other backends/formats won’t work - API stability disclaimer: “might change in the future”
Verdict
Good fit if you’re shipping Keras inference to resource-constrained or TensorFlow-averse C++ environments and can live with CPU-only, single-core execution. Skip it if you need GPU acceleration, custom lambda layers, or real-time training.