Your Keras model, now running on bare metal at CERN
hls4ml compiles neural networks into FPGA firmware for sub-microsecond inference.

What it does
hls4ml takes trained models from Keras, PyTorch, or ONNX and spits out synthesizable C++ for FPGA high-level synthesis (HLS) tools. The goal is inference so fast it can sit inside hardware trigger systems—think filtering particle collisions before the data even leaves the detector.
The interesting bit
The project grew out of CERN’s Large Hadron Collider, where L1 triggers have brutal latency budgets. That same need for “decide now, ask questions later” has since spread to nuclear fusion feedback loops, quantum computing control systems, and satellite environmental monitoring. It’s a neat example of physics infrastructure leaking into broader engineering.
Key highlights
- Supports Xilinx Vivado/Vitis HLS, Intel HLS, Catapult HLS, and experimental Intel oneAPI backends
- Handles CNNs, distributed arithmetic, and binary/ternary quantized networks (each with its own citation trail)
pip install hls4mlgets you started;hls4ml[profiling]adds profiling tools- Ships with example models and a separate tutorial repo
- Active enough to merit a 2025 overview paper and a v1.3.0 release
Caveats
- You’ll need the vendor HLS tools installed separately; Vivado alone is a multi-gigabyte download
- The README notes synthesis “might take several minutes”—understatement of the year for larger models
- Intel oneAPI support is explicitly marked experimental
Verdict
Grab this if you’re building real-time control systems where a GPU is too slow or too power-hungry, and you already speak some FPGA. Skip it if you’re looking for cloud-scale batch inference or don’t have access to synthesis tooling.