Is KernelBench open source?

Yes — ScalingIntelligence/KernelBench is an open-source project tracked on heatdrop.

What language is KernelBench written in?

ScalingIntelligence/KernelBench is primarily written in Jupyter Notebook.

How popular is KernelBench?

ScalingIntelligence/KernelBench has 1.1k stars on GitHub.

Where can I find KernelBench?

ScalingIntelligence/KernelBench is on GitHub at https://github.com/ScalingIntelligence/KernelBench.

← all repositories

ScalingIntelligence/KernelBench

LLMs vs. GPU kernels: a standardized grudge match

KernelBench tasks LLMs with transpiling PyTorch into efficient GPU kernels and scores them on whether the output is both correct and actually faster than the baseline.

★1.1k stars Jupyter Notebook LLMOps · Eval Coding Assistants

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does

KernelBench provides a dataset and evaluation harness that asks LLMs to convert PyTorch operations into low-level GPU kernels—CUDA, Triton, HIP, or other DSLs—and then checks if the result compiles, runs correctly against randomized inputs, and beats the reference PyTorch implementation on wall-clock time. The benchmark spans four difficulty tiers, from single operators like convolutions to full model architectures pulled from Hugging Face. It is designed to be used as a library or submodule rather than a monolithic agentic framework.

The interesting bit

The evaluation metric fast_p elegantly captures the dual constraint: a generated kernel only counts if it is both correct and achieves at least a p× speedup over PyTorch. This forces models to optimize, not just paraphrase. The project also extends beyond NVIDIA, with ROCm support and a growing list of DSL backends including ThunderKittens and TileLang.

Key highlights

Four-level curriculum: 100 single-kernel problems, 100 fusion patterns, 50 full architectures, and open-ended HuggingFace models.
Multi-backend support: CUDA, Triton, CUTLASS/CuTe, TileLang, ThunderKittens, and AMD HIP.
Cloud or local evaluation: run on your own GPU or offload to Modal serverless instances.
Built for research integration: includes adapters for multi-turn inference (Caesar), RL training (kernelbench-tinker), and evolutionary search (OpenEvolve).
HuggingFace dataset and ICML ‘25 paper backing the benchmark.

Caveats

The repo provides core scripts and evaluation logic, but the authors explicitly note it is “not intended to provide complex agentic scaffolds”; you’ll need to build your own solver loop.
ThunderKittens support requires manual environment setup and is currently limited to BF16.
Several integrations (Harbor, OpenEvolve) and roofline analysis are marked as experimental or work-in-progress.

Verdict

Grab this if you’re researching LLM code generation for high-performance computing or building RL/agentic pipelines that need a rigorous, hardware-aware reward signal. Skip it if you just want a drop-in kernel optimizer without doing any engineering.

Frequently asked

What is ScalingIntelligence/KernelBench?: KernelBench tasks LLMs with transpiling PyTorch into efficient GPU kernels and scores them on whether the output is both correct and actually faster than the baseline.
Is KernelBench open source?: Yes — ScalingIntelligence/KernelBench is an open-source project tracked on heatdrop.
What language is KernelBench written in?: ScalingIntelligence/KernelBench is primarily written in Jupyter Notebook.
How popular is KernelBench?: ScalingIntelligence/KernelBench has 1.1k stars on GitHub.
Where can I find KernelBench?: ScalingIntelligence/KernelBench is on GitHub at https://github.com/ScalingIntelligence/KernelBench.