← all repositories
Dobiasd/frugally-deep

Keras models in C++ without the TensorFlow tax

A header-only library that lets you run Keras inference in C++ by reimplementing just the prediction ops you actually need.

1.1k stars C++ Inference · Serving
frugally-deep
Velocity · 7d
+0.3
★ / day
Trend
steady
star history

What it does

frugally-deep lets you train a model in Python/Keras, convert it to a JSON format, then load and run predict() in C++ without linking against TensorFlow. It’s header-only, pulls in only other header-only dependencies (FunctionalPlus, Eigen, nlohmann/json), and explicitly targets smaller binaries and simpler deployment.

The interesting bit

The library reimplements a “small subset of TensorFlow” — just the ops needed for inference — and deliberately skips GPU support entirely, running single-core CPU predictions. The author notes you can parallelize across cores at the application level if throughput matters. It also avoids materializing the im2col matrix during convolutions, which saves RAM at the cost of that particular optimization.

Key highlights

  • Supports a wide layer zoo: standard conv/pool/RNN layers, attention mechanisms, normalization variants, and even training-only augmentation layers (passed through at inference)
  • Handles non-sequential models: multiple inputs/outputs, residual connections, shared layers, nested models, variable input shapes
  • Includes a validation step: convert_model.py auto-generates a test case, and load_model verifies C++ output matches Keras output
  • Works in 32-bit executables; C++14 minimum
  • Custom layers supported via factory functions passed to load_model

Caveats

  • No GPU support, no multi-core within a single prediction (by design)
  • Lambda layers, stateful RNNs, and several preprocessing layers (Hashing, TextVectorization, MelSpectrogram, etc.) are explicitly unsupported
  • Requires channels_last image data format; other backends/formats won’t work
  • API stability disclaimer: “might change in the future”

Verdict

Good fit if you’re shipping Keras inference to resource-constrained or TensorFlow-averse C++ environments and can live with CPU-only, single-core execution. Skip it if you need GPU acceleration, custom lambda layers, or real-time training.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.