← all repositories

mirage-project/mirage

A compiler and runtime system that fuses multi-GPU LLM inference computation and communication into a single megakernel for lower latency.

mirage
Velocity · 7d
+3.0
★ / day
Trend
steady
star history

Mirage Persistent Kernel (MPK) is a compiler and runtime that automatically transforms LLM inference workflows from multiple separate kernels into a single fused megakernel. It merges computation and communication across multiple GPUs into one kernel launch, achieving 1.2× to 6.7× latency reduction. Developers provide a few dozen lines of Python to define kernel inputs and outputs, and the system compiles models from Hugging Face into optimized megakernels using Triton and FlashInfer backends.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.