Is cudnn-frontend open source?

Yes — NVIDIA/cudnn-frontend is open source, released under the MIT license.

What language is cudnn-frontend written in?

NVIDIA/cudnn-frontend is primarily written in Python.

How popular is cudnn-frontend?

NVIDIA/cudnn-frontend has 887 stars on GitHub.

Where can I find cudnn-frontend?

NVIDIA/cudnn-frontend is on GitHub at https://github.com/NVIDIA/cudnn-frontend.

← all repositories

NVIDIA/cudnn-frontend

NVIDIA open-sources high-performance cuDNN kernels

It open-sources cuDNN kernels for attention and MoE while wrapping the library's backend in a friendlier C++ and Python graph API.

★887 stars Python Inference · Serving ML Frameworks

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does

cuDNN Frontend is NVIDIA’s official, MIT-licensed wrapper around the cuDNN backend. It exposes a header-only C++ graph API and Python bindings that let you compose persistent, reusable computation graphs instead of hand-cranking backend descriptors. The repository also hosts a growing catalog of inspectable CUDA kernels—flash attention, grouped GEMM fusions for Mixture-of-Experts, fused normalization, and sparse attention variants—targeting Hopper and Blackwell silicon in FP8 and MXFP8.

The interesting bit

For years the cuDNN backend was essentially a black box; this project publishes the actual source for kernels like SDPA, SwiGLU fused GEMMs, and DeepSeek-style sparse attention so you can modify them. The frontend layer itself replaces the old descriptor boilerplate with a unified cudnn_frontend::graph::Graph object that handles autotuning and persistence.

Key highlights

Header-only C++ API plus Python bindings via pybind11, with native PyTorch custom operators and torch.compile support.
Inspectable kernel catalog includes SDPA/Flash Attention forward and backward, grouped GEMM + GLU/SwiGLU for MoE, Native Sparse Attention (NSA), and fused RMSNorm + SiLU.
Targets NVIDIA Hopper (H100/H200) and Blackwell (B200/GB200/GB300) across FP16, BF16, FP8, and MXFP8 precisions.
Built-in autotuning and a unified graph API that creates reusable, persistent subgraph objects.
MIT licensed, with PyPI packages available.

Caveats

The open-source kernel catalog is explicitly a work in progress; NVIDIA is adding implementations “based on customer needs,” so coverage is currently selective.
Some containerized environments—specifically GKE with the TCPXO NCCL plugin—can hit a Multiple libcudart libraries found error and require a manual library override.
You still need the proprietary cuDNN backend (minimum 8.5.0) and a recent CUDA toolkit; the frontend is an entry point, not a standalone replacement.

Verdict

If you are training large transformers on Hopper or Blackwell and want to inspect or customize the kernel code behind your attention and MoE layers, this is now the place to start. Everyone else—especially anyone on older GPUs or without a cuDNN backend license—can safely wait.

Frequently asked

What is NVIDIA/cudnn-frontend?: It open-sources cuDNN kernels for attention and MoE while wrapping the library's backend in a friendlier C++ and Python graph API.
Is cudnn-frontend open source?: Yes — NVIDIA/cudnn-frontend is open source, released under the MIT license.
What language is cudnn-frontend written in?: NVIDIA/cudnn-frontend is primarily written in Python.
How popular is cudnn-frontend?: NVIDIA/cudnn-frontend has 887 stars on GitHub.
Where can I find cudnn-frontend?: NVIDIA/cudnn-frontend is on GitHub at https://github.com/NVIDIA/cudnn-frontend.