deepreinforce-ai/CUDA-L2
A system that uses reinforcement learning agents and LLMs to automatically tune Half-precision General Matrix Multiply (HGEMM) CUDA kernels for GPUs.

CUDA-L2 combines large language models with reinforcement learning to search and optimize CUDA kernel configurations for matrix multiplication. The system automatically outperforms established baselines including NVIDIA’s cuBLAS library and cuBLASLt auto-tuning by generating optimized kernel parameters. It supports multiple GPU architectures including RTX 3090, A100, and H100, targeting the core computational primitive that underlies LLM and deep learning inference.