← all repositories

lucidrains/linear-attention-transformer

A Transformer variant combining local and global attention mechanisms that scales linearly with sequence length for efficient language modeling.

838 stars Python Language ModelsML Frameworks
linear-attention-transformer
Velocity · 7d
+0.4
★ / day
Trend
steady
star history

This repository implements a Transformer architecture with a hybrid attention mechanism combining local (QK^T)V attention with global Q(K^TV) attention for linear time and memory complexity. It includes features like reversible networks, feedforward chunking, and embedding factorization to optimize memory usage. The library is designed for long-sequence language modeling tasks where standard quadratic attention becomes prohibitive.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.