fla-org/flash-linear-attention
A library of hardware-efficient attention and state space model building blocks for training and running modern sequence models.

Velocity · 7d
+5.8
★ / day
Trend
→steady
star history
Flash Linear Attention provides optimized implementations of token-mixing layers including linear attention, sparse attention, and state space models. It supports hybrid LLM architectures like Mamba, Gated DeltaNet, MoBA, and YOCO, with platform-agnostic kernels verified on NVIDIA, AMD, and Intel hardware. The library includes fused modules, training utilities, generation helpers, and benchmarking tools.