ELS-RD/kernl
An open-source GPU inference engine built in Triton that accelerates PyTorch transformer models with optimized custom kernels.

Kernl provides optimized Triton kernels to replace standard PyTorch operations, achieving multi-times speedup on transformer inference workloads on NVIDIA GPUs. Each kernel is kept under 200 lines of code for readability and hackability. The project originally pioneered the Triton debugger, which was upstreamed to the official Triton repository in 2023. It requires minimal code changes to integrate, supporting inference acceleration through a single-line replacement of PyTorch operations.