← all repositories

fla-org/flash-linear-attention

A library of hardware-efficient attention and state space model building blocks for training and running modern sequence models.

5.2k stars Python ML FrameworksLanguage Models
flash-linear-attention
Velocity · 7d
+5.8
★ / day
Trend
steady
star history

Flash Linear Attention provides optimized implementations of token-mixing layers including linear attention, sparse attention, and state space models. It supports hybrid LLM architectures like Mamba, Gated DeltaNet, MoBA, and YOCO, with platform-agnostic kernels verified on NVIDIA, AMD, and Intel hardware. The library includes fused modules, training utilities, generation helpers, and benchmarking tools.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.