NVIDIA/cudnn-frontend
NVIDIA's GPU kernel library for accelerating transformer training and inference with optimized attention, GEMM, and normalization operations.

cuDNN Frontend provides a C++ header-only API and Python interface (with native PyTorch integration) to the cuDNN Graph API. It exposes high-performance open-source kernels including scaled dot-product attention (Flash Attention), grouped GEMM fusions for mixture-of-experts training, and fused normalization-plus-activation. The library targets NVIDIA Hopper and Blackwell architectures with support for FP16, BF16, FP8, and MXFP8 precision, enabling optimized deep learning workloads on H100/H200 and B200/GB200/GB300 GPUs.