ScalingIntelligence/KernelBench
A benchmark suite evaluating LLMs' ability to generate efficient CUDA and DSL GPU kernels from PyTorch operator specifications.

KernelBench tasks LLMs with transpiling PyTorch operators to GPU kernels and provides an evaluation toolkit to measure correctness and performance. It contains 250 problems across three difficulty levels: single-kernel operators like convolutions and matrix multiplies, fused kernel patterns combining multiple operations, and full model architecture optimizations. The repository supports automated benchmarking with evaluation scripts and is published as an ICML 2025 paper.