← all repositories

BBuf/how-to-optim-algorithm-in-cuda

A study notebook containing CUDA kernels, Triton code, CUTLASS notes, and LLM inference/training optimization material.

how-to-optim-algorithm-in-cuda
Velocity · 7d
+1.1
★ / day
Trend
steady
star history

This repository serves as a public engineering notebook for GPU systems work, focusing on optimizing AI/ML algorithms in CUDA. It includes handwritten CUDA kernels for common operations (reduce, softmax, GEMV, linear attention), CUTLASS and CuTe DSL notes covering GEMM, TMA, and WGMMA, Triton kernels with PyTorch interop examples, and extensive LLM serving and training optimization notes. The material is organized into directories covering cuda-kernels, cuda-mode lectures, cutlass, triton, large-language-model systems, and PyTorch internals.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.