← all repositories

NVIDIA/cudnn-frontend

NVIDIA's GPU kernel library for accelerating transformer training and inference with optimized attention, GEMM, and normalization operations.

cudnn-frontend
Velocity · 7d
+0.4
★ / day
Trend
steady
star history

cuDNN Frontend provides a C++ header-only API and Python interface (with native PyTorch integration) to the cuDNN Graph API. It exposes high-performance open-source kernels including scaled dot-product attention (Flash Attention), grouped GEMM fusions for mixture-of-experts training, and fused normalization-plus-activation. The library targets NVIDIA Hopper and Blackwell architectures with support for FP16, BF16, FP8, and MXFP8 precision, enabling optimized deep learning workloads on H100/H200 and B200/GB200/GB300 GPUs.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.