Is AITemplate open source?

Yes — facebookincubator/AITemplate is open source, released under the Apache-2.0 license.

What language is AITemplate written in?

facebookincubator/AITemplate is primarily written in Python.

How popular is AITemplate?

facebookincubator/AITemplate has 4.7k stars on GitHub.

Where can I find AITemplate?

facebookincubator/AITemplate is on GitHub at https://github.com/facebookincubator/AITemplate.

← all repositories

facebookincubator/AITemplate

Meta's inference compiler that treats cuDNN as optional

It compiles deep learning models directly to CUDA or HIP C++ to squeeze peak FP16 inference out of modern GPUs without leaning on cuDNN or TensorRT.

★4.7k stars Python Inference · Serving

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

AITemplate is a Python framework that transforms neural networks into self-contained CUDA or HIP C++ binaries for inference. It targets FP16 TensorCore on NVIDIA GPUs and MatrixCore on AMD GPUs, aiming for roofline performance on models like ResNet, BERT, and Stable Diffusion. The generated runtime can ingest PyTorch tensors without copying, but it also works standalone if PyTorch isn’t around.

The interesting bit

Instead of calling into cuBLAS, cuDNN, or TensorRT, AITemplate generates its own kernels and fuses them aggressively—horizontally across parallel operations, vertically into matrix cores, and even through memory shuffles like concat and slice. That independence means the compiled binary is portable across software environments as long as the hardware matches.

Key highlights

Claims near-roofline FP16 performance on major models (ResNet, MaskRCNN, BERT, ViT, Stable Diffusion) for both NVIDIA and AMD GPUs.
Zero third-party runtime dependencies: no cuBLAS, cuDNN, rocBLAS, MIOpen, or TensorRT.
Advanced kernel fusion: horizontal, vertical, and memory-aware fusion are all supported.
FX2AIT can partially accelerate PyTorch models even when some operators aren’t yet supported by AITemplate.
Extensions require only two Python files (graph definition and backend codegen) plus a standard CUDA/HIP header.

Caveats

Hardware support is narrow: NVIDIA Ampere (SM80+) and AMD CDNA2 (MI-210/250) only; older GPUs like V100 or MI-100 may see broken or slower kernels.
Dynamic shapes are still a work in progress; the mid-term roadmap emphasizes better transformer sequence handling and symbolic shapes.
Not every PyTorch operator is supported, so models may need partial lowering or manual rewriting for best results.

Verdict

Worth a look if you are serving FP16 inference on A100 or MI-250 and want to escape the usual vendor library stack. Skip it if you are on older GPUs, need fully dynamic shapes today, or rely on PyTorch operators outside the current coverage map.

Frequently asked

What is facebookincubator/AITemplate?: It compiles deep learning models directly to CUDA or HIP C++ to squeeze peak FP16 inference out of modern GPUs without leaning on cuDNN or TensorRT.
Is AITemplate open source?: Yes — facebookincubator/AITemplate is open source, released under the Apache-2.0 license.
What language is AITemplate written in?: facebookincubator/AITemplate is primarily written in Python.
How popular is AITemplate?: facebookincubator/AITemplate has 4.7k stars on GitHub.
Where can I find AITemplate?: facebookincubator/AITemplate is on GitHub at https://github.com/facebookincubator/AITemplate.