Is flash-linear-attention open source?

Yes — fla-org/flash-linear-attention is open source, released under the MIT license.

What language is flash-linear-attention written in?

fla-org/flash-linear-attention is primarily written in Python.

How popular is flash-linear-attention?

fla-org/flash-linear-attention has 5.4k stars on GitHub and is currently accelerating.

Where can I find flash-linear-attention?

fla-org/flash-linear-attention is on GitHub at https://github.com/fla-org/flash-linear-attention.

← all repositories

fla-org/flash-linear-attention

The post-Transformer layer zoo, wired for training

It corrals the latest subquadratic sequence-model research into hardware-efficient, training-ready PyTorch layers verified across NVIDIA, AMD, and Intel GPUs.

★5.4k stars Python ML Frameworks Language Models

View on GitHub ↗ Homepage ↗

Velocity · 7d

+16

★ / day

Trend

↗accelerating

star history

What it does flash-linear-attention is a Python library that gathers implementations of modern sequence-model layers and full architectures. It covers linear attention variants, state-space models, sparse attention mechanisms, and hybrid LLM designs, packaging them as training-ready PyTorch modules. The project states its kernels are platform-agnostic and verified on NVIDIA, AMD, and Intel hardware.

The interesting bit The news log reads like an arXiv digest for subquadratic models, with recent additions including Gated DeltaNet 2, Raven, Mamba3, and MoBA. Rather than just research code, the maintainers package these as training-ready layers—complete with fused modules, context parallelism, and a companion flame training framework—and have landed their Gated DeltaNet kernel in Qwen3-Next.

Key highlights

Broad architecture coverage: RetNet, GLA, DeltaNet, HGRN/HGRN2, RWKV6/7, Mamba2/3, GSA, YOCO, MLA, MoBA, NSA, and others listed in the models table.
Hardware targets span NVIDIA, AMD, and Intel GPUs, with selected kernels also offering a TileLang backend.
Training-focused extras include fused modules, variable-length input support, hybrid-model composition, and context parallelism for distributed sequence training.
Real-world adoption: the Gated DeltaNet implementation is integrated into Qwen3-Next.
Companion training framework flame (torchtitan-based) is maintained under the same fla-org umbrella.

Verdict Reach for this if you are building or fine-tuning experimental LLMs and want subquadratic alternatives to standard attention without writing custom CUDA. If your workloads are happily served by vanilla Transformers, you can safely skip the zoo.

Frequently asked

What is fla-org/flash-linear-attention?: It corrals the latest subquadratic sequence-model research into hardware-efficient, training-ready PyTorch layers verified across NVIDIA, AMD, and Intel GPUs.
Is flash-linear-attention open source?: Yes — fla-org/flash-linear-attention is open source, released under the MIT license.
What language is flash-linear-attention written in?: fla-org/flash-linear-attention is primarily written in Python.
How popular is flash-linear-attention?: fla-org/flash-linear-attention has 5.4k stars on GitHub and is currently accelerating.
Where can I find flash-linear-attention?: fla-org/flash-linear-attention is on GitHub at https://github.com/fla-org/flash-linear-attention.