← all repositories

lucidrains/native-sparse-attention-pytorch

PyTorch implementation of Deepseek's native sparse attention mechanism for efficient transformer inference.

native-sparse-attention-pytorch
Velocity · 7d
+1.7
★ / day
Trend
steady
star history

This repository implements the sparse attention pattern from the Deepseek ‘Native Sparse Attention’ paper, designed to accelerate transformer-based language models. It provides a custom PyTorch attention module with configurable sliding window, compression blocks, and selection blocks. The implementation uses Triton and Flex Attention for efficient computation, and includes an example training script for Enwik8 language modeling.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.