← all repositories

Relaxed-System-Lab/Flash-Sparse-Attention

A library providing optimized CUDA kernels for efficient Native Sparse Attention across popular large language models.

Flash-Sparse-Attention
Velocity · 7d
+2.1
★ / day
Trend
steady
star history

Flash Sparse Attention (FSA) implements efficient kernel designs for Native Sparse Attention (NSA) in LLMs. The library offers a novel approach to sparse attention computation optimized for modern GPUs, enabling faster training and inference by batching query heads that share key-value head configurations. It includes a beta version of one-step decoding and tools for benchmarking the attention module performance against standard implementations.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.