Relaxed-System-Lab/Flash-Sparse-Attention
A library providing optimized CUDA kernels for efficient Native Sparse Attention across popular large language models.

Flash Sparse Attention (FSA) implements efficient kernel designs for Native Sparse Attention (NSA) in LLMs. The library offers a novel approach to sparse attention computation optimized for modern GPUs, enabling faster training and inference by batching query heads that share key-value head configurations. It includes a beta version of one-step decoding and tools for benchmarking the attention module performance against standard implementations.