← all repositories
linkedin/Liger-Kernel

Drop-in Triton kernels that make LLM training stop choking on memory

LinkedIn's open-source collection fuses and chunks standard transformer layers so you can train bigger models on the same GPUs without touching your modeling code.

Liger-Kernel
Velocity · 7d
+9.6
★ / day
Trend
steady
star history

What it does Liger Kernel is a library of hand-written Triton kernels for the unglamorous but memory-hungry parts of LLM training—RMSNorm, RoPE, SwiGLU, CrossEntropy, and a growing list of post-training losses like DPO and ORPO. You patch a Hugging Face model in one line (AutoLigerKernelForCausalLM) or import the kernels directly. It runs alongside Flash Attention, FSDP, and DeepSpeed without ceremony.

The interesting bit The project treats kernel fusion as plumbing, not research. By fusing operations, replacing tensors in-place, and chunking large computations, it claims 20% throughput gains and 60% memory reduction on standard LLaMA 3-8B runs. The post-training kernels go further—up to 80% memory savings for alignment losses—because the authors noticed everyone was quietly OOMing during DPO. It is exact, not approximate: they unit-test and convergence-test against vanilla PyTorch.

Key highlights

  • One-line monkey-patch for Hugging Face models, or compose kernels à la carte
  • Exact numerical parity with PyTorch; no approximations
  • Supports CUDA and ROCm; minimal dependencies (just torch and triton)
  • Integrates with Axolotl, LLaMA-Factory, HF Trainer, Lightning, and others
  • Optional cuTile backend for environments with CUDA Toolkit 13.1+

Caveats

  • Benchmarks shown are specific: LLaMA 3-8B, bf16, AdamW, gradient checkpointing, FSDP1 on 8×A100s; your mileage will vary
  • The AutoLigerKernelForCausalLM wrapper only works for supported model types; unsupported architectures need manual kernel composition
  • cuTile backend is opt-in and raises hard errors if dependencies or hardware mismatch

Verdict Worth a look if you are fine-tuning or aligning LLMs and have already squeezed what you can from Flash Attention and FSDP. Less compelling if you are training from scratch with a custom architecture not covered by the patching APIs.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.