← all repositories
microsoft/nnfusion

Microsoft's DNN compiler that turns models into bare-metal code

NNFusion compiles TensorFlow and ONNX models ahead-of-time into framework-free C++/CUDA executables, stripping away runtime overhead and library dependencies.

nnfusion
Velocity · 7d
+0.4
★ / day
Trend
steady
star history

What it does NNFusion takes a frozen TensorFlow or ONNX model and compiles it to a standalone executable. The output is human-readable C++ or CUDA source code that you build with cmake and run directly—no TensorFlow, no PyTorch, no runtime framework dragging down your inference. It targets CUDA GPUs, ROCm GPUs, and CPU.

The interesting bit The “source-to-source” angle is the quiet killer feature. Instead of generating opaque bytecode or graph schedules, NNFusion emits actual code you can hand-tweak. Want to swap in a custom kernel? Edit the generated source directly. This is compiler-as-carpenter, not compiler-as-black-box.

Key highlights

  • Full-stack optimization: data-flow graph passes (CSE, constant folding), kernel fusion, co-scheduling, and auto-tuning integration
  • Ahead-of-time compilation eliminates framework runtime overhead entirely
  • Generated code is human-readable and modifiable for model-specific tuning
  • Supports parallel training via Microsoft’s separate SuperScaler project
  • Published at OSDI ‘20 under the name “Rammer”

Caveats

  • Docker quick-start is tied to Ubuntu 16.04/18.04 and nvidia-docker; newer setups may need source builds
  • The benchmark speedup numbers live on a separate artifact branch, not the main README
  • v0.3 is the latest tagged release; activity level is unclear from the README alone

Verdict Grab this if you’re shipping inference to edge devices or container-hating environments and need to shed framework bloat. Skip it if you’re married to dynamic shapes, eager execution, or want a polished Python API—the workflow here is compile, cmake, run.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.