← all repositories

xlite-dev/LeetCUDA

Educational repository teaching modern CUDA programming with PyTorch, featuring 200+ kernels and implementation examples of flash attention and HGEMM.

11.2k stars Cuda Learning
LeetCUDA
Velocity · 7d
+8.8
★ / day
Trend
steady
star history

LeetCUDA is a learning-focused CUDA tutorial aimed at beginners, providing annotated implementations of GPU kernels including half-precision matrix multiplication (HGEMM), flash attention using tensor cores with pure MMA PTX, and various CUDA programming patterns. The content is structured around PyTorch integration and covers topics like TF32/F16/BF16/F8 precision formats used in deep learning workloads.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.