facebookincubator/AITemplate
A Python framework that compiles deep neural networks into high-performance CUDA/HIP C++ code for GPU inference.

Velocity · 7d
+3.3
★ / day
Trend
→steady
star history
AITemplate transforms deep neural networks into optimized GPU kernels by compiling Python model definitions into portable CUDA/HIP C++ code. It achieves near-peak performance on FP16 TensorCore (NVIDIA) and MatrixCore (AMD) hardware without third-party runtime dependencies, generating self-contained binaries with advanced horizontal and vertical operator fusion capabilities.