mirage-project/mirage
A compiler and runtime system that fuses multi-GPU LLM inference computation and communication into a single megakernel for lower latency.

Mirage Persistent Kernel (MPK) is a compiler and runtime that automatically transforms LLM inference workflows from multiple separate kernels into a single fused megakernel. It merges computation and communication across multiple GPUs into one kernel launch, achieving 1.2× to 6.7× latency reduction. Developers provide a few dozen lines of Python to define kernel inputs and outputs, and the system compiles models from Hugging Face into optimized megakernels using Triton and FlashInfer backends.