← all repositories

deepreinforce-ai/CUDA-L2

A system that uses reinforcement learning agents and LLMs to automatically tune Half-precision General Matrix Multiply (HGEMM) CUDA kernels for GPUs.

442 stars Cuda Inference · ServingAgents
CUDA-L2
Velocity · 7d
+2.4
★ / day
Trend
steady
star history

CUDA-L2 combines large language models with reinforcement learning to search and optimize CUDA kernel configurations for matrix multiplication. The system automatically outperforms established baselines including NVIDIA’s cuBLAS library and cuBLASLt auto-tuning by generating optimized kernel parameters. It supports multiple GPU architectures including RTX 3090, A100, and H100, targeting the core computational primitive that underlies LLM and deep learning inference.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.